论文标题
通过多个对抗运动先验的高级技能在增强学习中
Advanced Skills through Multiple Adversarial Motion Priors in Reinforcement Learning
论文作者
论文摘要
近年来,强化学习(RL)显示出对高度铰接机器人系统的机能控制的出色表现。这种方法通常涉及繁琐的奖励功能调整以实现所需的运动方式。模仿学习方法,例如对抗运动先验,旨在通过鼓励预定义的运动方式来减少此问题。在这项工作中,我们提出了一种增强基于基于的对抗运动的概念的方法,以允许多种可离散的可切换样式。我们表明,即使与运动无数据的技能,也可以同时学习多种样式和技能,而没有明显的性能差异。我们的方法在几个现实世界实验中得到了验证,该实验是通过轮式四足动物的机器人,显示从现有的RL控制器和轨迹优化(例如鸭子和步行)中学到的技能,以及新颖的技能,例如在四足体和类人动物构型之间切换。为了获得后者的技能,机器人必须站起来,在两个轮子上航行并坐下。我们没有调整静坐运动,而是验证站立运动的反向播放有助于机器人发现可行的静坐行为,并避免乏味的奖励功能调整。
In recent years, reinforcement learning (RL) has shown outstanding performance for locomotion control of highly articulated robotic systems. Such approaches typically involve tedious reward function tuning to achieve the desired motion style. Imitation learning approaches such as adversarial motion priors aim to reduce this problem by encouraging a pre-defined motion style. In this work, we present an approach to augment the concept of adversarial motion prior-based RL to allow for multiple, discretely switchable styles. We show that multiple styles and skills can be learned simultaneously without notable performance differences, even in combination with motion data-free skills. Our approach is validated in several real-world experiments with a wheeled-legged quadruped robot showing skills learned from existing RL controllers and trajectory optimization, such as ducking and walking, and novel skills such as switching between a quadrupedal and humanoid configuration. For the latter skill, the robot is required to stand up, navigate on two wheels, and sit down. Instead of tuning the sit-down motion, we verify that a reverse playback of the stand-up movement helps the robot discover feasible sit-down behaviors and avoids tedious reward function tuning.