通过学习运动和本地导航端到端的高级技能

论文标题

通过学习运动和本地导航端到端的高级技能

Advanced Skills by Learning Locomotion and Local Navigation End-to-End

论文作者

Rudin, Nikita, Hoeller, David, Bjelonic, Marko, Hutter, Marco

论文摘要

通过腿部机器人在具有挑战性的环境上进行本地导航的常见方法需要路径计划，路径跟随和运动，这通常需要机动控制策略，以准确跟踪指挥速度。但是，通过将导航问题分解为这些子任务，我们限制了机器人的功能，因为各个任务不考虑完整的解决方案空间。在这项工作中，我们建议通过深入强化学习训练端到端政策来解决完整的问题。该机器人不是不断跟踪预算的路径，而需要在提供的时间内达到目标位置。该任务的成功仅在情节结束时进行评估，这意味着该策略无需尽快到达目标。可以免费选择其路径和运动步态。以这种方式培训政策可以打开更多可能的解决方案，这使机器人可以学习更多复杂的行为。我们比较我们的速度跟踪方法，并表明任务奖励的时间依赖性对于成功学习这些新行为至关重要。最后，我们证明了在真正的四足动物机器人上成功部署政策。该机器人能够跨越具有挑战性的地形，这是以前无法实现的，同时使用更节能的步态并达到了更高的成功率。

The common approach for local navigation on challenging environments with legged robots requires path planning, path following and locomotion, which usually requires a locomotion control policy that accurately tracks a commanded velocity. However, by breaking down the navigation problem into these sub-tasks, we limit the robot's capabilities since the individual tasks do not consider the full solution space. In this work, we propose to solve the complete problem by training an end-to-end policy with deep reinforcement learning. Instead of continuously tracking a precomputed path, the robot needs to reach a target position within a provided time. The task's success is only evaluated at the end of an episode, meaning that the policy does not need to reach the target as fast as possible. It is free to select its path and the locomotion gait. Training a policy in this way opens up a larger set of possible solutions, which allows the robot to learn more complex behaviors. We compare our approach to velocity tracking and additionally show that the time dependence of the task reward is critical to successfully learn these new behaviors. Finally, we demonstrate the successful deployment of policies on a real quadrupedal robot. The robot is able to cross challenging terrains, which were not possible previously, while using a more energy-efficient gait and achieving a higher success rate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题