通过发现各种环境轨迹生成者先验，有效学习运动技能

论文标题

通过发现各种环境轨迹生成者先验，有效学习运动技能

Efficient Learning of Locomotion Skills through the Discovery of Diverse Environmental Trajectory Generator Priors

论文作者

Surana, Shikha, Lim, Bryan, Cully, Antoine

论文摘要

最近，基于数据驱动的学习方法在学习各种非结构化地形的强大运动控制器方面特别成功。先前的工作表明，以轨迹发生器（TGS）的形式融入良好的运动先验，可有效地学习复杂的运动技能。但是，在任务/环境变得越来越复杂的情况下定义良好的单个TG仍然是一个具有挑战性的问题，因为它需要进行广泛的调整并冒着降低先前有效性的风险。在本文中，我们介绍了进化的环境轨迹发生器（EETG），这种方法使用质量多样性算法学习了一套多样化的专业运动验证者，同时在调制TG（PMTG）体系结构的策略中维护单个策略。结果表明，EETG使一个四倍的机器人能够成功地穿越广泛的环境，例如斜坡，楼梯，粗糙的地形和平衡梁。我们的实验表明，学习一组专业的TG先验相比，在处理广泛的环境时使用单个，固定的先验的效率要高（5倍）。

Data-driven learning based methods have recently been particularly successful at learning robust locomotion controllers for a variety of unstructured terrains. Prior work has shown that incorporating good locomotion priors in the form of trajectory generators (TGs) is effective at efficiently learning complex locomotion skills. However, defining a good, single TG as tasks/environments become increasingly more complex remains a challenging problem as it requires extensive tuning and risks reducing the effectiveness of the prior. In this paper, we present Evolved Environmental Trajectory Generators (EETG), a method that learns a diverse set of specialised locomotion priors using Quality-Diversity algorithms while maintaining a single policy within the Policies Modulating TG (PMTG) architecture. The results demonstrate that EETG enables a quadruped robot to successfully traverse a wide range of environments, such as slopes, stairs, rough terrain, and balance beams. Our experiments show that learning a diverse set of specialized TG priors is significantly (5 times) more efficient than using a single, fixed prior when dealing with a wide range of environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题