要在协作多机器人深度强化学习中缩小SIM到现实的差距

论文标题

要在协作多机器人深度强化学习中缩小SIM到现实的差距

Towards Closing the Sim-to-Real Gap in Collaborative Multi-Robot Deep Reinforcement Learning

论文作者

Zhao, Wenshuai, Queralta, Jorge Peña, Qingqing, Li, Westerlund, Tomi

论文摘要

深度强化学习中的当前研究方向包括弥合模拟真实差距，提高了分布式多代理增强学习的经验样本效率，以及针对分布式学习中对抗性药物的强大方法的发展。在这项工作中，我们特别有兴趣分析多代理增强学习如何在分布式多机器人系统中弥合现实的差距，在分布式多机器人系统中，不同机器人的操作不一定是同质的。这些变化可能是由于感应不匹配，机械接头校准的固有误差或精度上的简单差异而发生的。尽管我们的结果基于模拟，但我们介绍了通过近端策略优化（PPO）在分布式增强学习中传感，校准和准确性不匹配的效果。我们讨论了两种不同类型的扰动以及经历这些扰动的代理数量如何影响协作学习工作的方式。模拟是使用子弹物理引擎中的Kuka ARM模型进行的。据我们所知，这是探索多机器人系统中PPO局限性的第一项工作，当时考虑了不同的机器人可能会暴露于其传感器或执行器引起错误的不同环境中。根据这项工作的结论，我们为未来的工作设定了设计和开发方法，以实现强大的加强学习，以实现在多机器人系统中可能有所不同的现实世界扰动。

Current research directions in deep reinforcement learning include bridging the simulation-reality gap, improving sample efficiency of experiences in distributed multi-agent reinforcement learning, together with the development of robust methods against adversarial agents in distributed learning, among many others. In this work, we are particularly interested in analyzing how multi-agent reinforcement learning can bridge the gap to reality in distributed multi-robot systems where the operation of the different robots is not necessarily homogeneous. These variations can happen due to sensing mismatches, inherent errors in terms of calibration of the mechanical joints, or simple differences in accuracy. While our results are simulation-based, we introduce the effect of sensing, calibration, and accuracy mismatches in distributed reinforcement learning with proximal policy optimization (PPO). We discuss on how both the different types of perturbances and how the number of agents experiencing those perturbances affect the collaborative learning effort. The simulations are carried out using a Kuka arm model in the Bullet physics engine. This is, to the best of our knowledge, the first work exploring the limitations of PPO in multi-robot systems when considering that different robots might be exposed to different environments where their sensors or actuators have induced errors. With the conclusions of this work, we set the initial point for future work on designing and developing methods to achieve robust reinforcement learning on the presence of real-world perturbances that might differ within a multi-robot system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题