动态四倍体机器人运动的有限限制政策优化

论文标题

动态四倍体机器人运动的有限限制政策优化

Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion

论文作者

Gangapurwala, Siddhant, Mitchell, Alexander, Havoutis, Ioannis

论文摘要

深度强化学习（RL）使用无模型技术来优化特定于任务的控制策略。尽管出现了一种有希望的复杂问题方法，但RL仍然很难可靠地用于现实世界应用。除了诸如精确奖励功能调整，不准确的感应和驱动以及非确定性响应之类的挑战外，现有的RL方法不能保证在真正的机器人场景至关重要的必要安全约束中行为。在这方面，我们介绍了指导性约束策略优化（GCPO），这是一个基于我们实现受限的近端策略优化（CPPO）的RL框架，以跟踪基本速度命令，同时遵循定义的约束。我们还介绍了在违反限制的情况下鼓励国家恢复到受约束地区的计划。我们介绍了训练方法的实验结果，并将其测试在实际的Anymal四足机器人上。我们将方法与不受约束的RL方法进行比较，并表明指导的约束RL提供了更快的收敛速度接近所需的最佳，从而导致了最佳，但物理可行的机器人控制行为，而无需精确的奖励功能调整。

Deep reinforcement learning (RL) uses model-free techniques to optimize task-specific control policies. Despite having emerged as a promising approach for complex problems, RL is still hard to use reliably for real-world applications. Apart from challenges such as precise reward function tuning, inaccurate sensing and actuation, and non-deterministic response, existing RL methods do not guarantee behavior within required safety constraints that are crucial for real robot scenarios. In this regard, we introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained proximal policy optimization (CPPO) for tracking base velocity commands while following the defined constraints. We also introduce schemes which encourage state recovery into constrained regions in case of constraint violations. We present experimental results of our training method and test it on the real ANYmal quadruped robot. We compare our approach against the unconstrained RL method and show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题