学习复杂的运动技能的开放式学习策略

论文标题

学习复杂的运动技能的开放式学习策略

Open-Ended Learning Strategies for Learning Complex Locomotion Skills

论文作者

Zhou, Fangqin, Vanschoren, Joaquin

论文摘要

通过增强学习（RL），教机器人在复杂的三维环境环境下学习多种运动技能仍然具有挑战性。已经表明，在将简单设置的培训代理转移到复杂设置之前，可以改善培训过程，但到目前为止，仅在相对简单的机车技巧中。在这项工作中，我们适应了增强的配对开放式开拓者（EPOET）方法，以训练更复杂的代理，以在复杂的三维地形上有效行走。首先，为了产生更加坚固且多样化的三维训练地形，并增加了复杂性，我们扩展了组成模式产生的网络 - 增强拓扑的神经进化（CPPN-NEAT）方法，并包括随机形状。其次，我们将Epoet与软性演员批评外优化相结合，产生Epoet-SAC，以确保代理商可以学习更多多样化的技能来解决更具挑战性的任务。我们的实验结果表明，新生成的三维地形具有足够的多样性和复杂性来指导学习，Epoet成功地学习了这些地形上的复杂运动技能，并且我们提出的EPOET-SAC方法在Epoet上略有改进。

Teaching robots to learn diverse locomotion skills under complex three-dimensional environmental settings via Reinforcement Learning (RL) is still challenging. It has been shown that training agents in simple settings before moving them on to complex settings improves the training process, but so far only in the context of relatively simple locomotion skills. In this work, we adapt the Enhanced Paired Open-Ended Trailblazer (ePOET) approach to train more complex agents to walk efficiently on complex three-dimensional terrains. First, to generate more rugged and diverse three-dimensional training terrains with increasing complexity, we extend the Compositional Pattern Producing Networks - Neuroevolution of Augmenting Topologies (CPPN-NEAT) approach and include randomized shapes. Second, we combine ePOET with Soft Actor-Critic off-policy optimization, yielding ePOET-SAC, to ensure that the agent could learn more diverse skills to solve more challenging tasks. Our experimental results show that the newly generated three-dimensional terrains have sufficient diversity and complexity to guide learning, that ePOET successfully learns complex locomotion skills on these terrains, and that our proposed ePOET-SAC approach slightly improves upon ePOET.

下载PDF全文

下载文献需遵守相关版权规定

论文标题