使用合规的动作原语的深度加强学习，以实现富裕的技能

论文标题

使用合规的动作原语的深度加强学习，以实现富裕的技能

Deep Reinforcement Learning for Contact-Rich Skills Using Compliant Movement Primitives

论文作者

Spector, Oren, Zacksenhouse, Miriam

论文摘要

近年来，工业机器人已安装在各种行业中，以处理先进的制造和高精度任务。但是，与人类运营商相比，其灵活性，适应性和决策能力有限，从而阻碍了工业机器人的进一步整合。组装任务对于机器人来说尤其具有挑战性，因为它们的接触率丰富并且对小型不确定性敏感。虽然增强学习（RL）提供了一个有希望的框架来从头开始学习接触丰富的控制策略，但由于高脆性和样本复杂性，其对高维连续状态空间的适用性仍然相当有限。为了解决这些问题，我们提出了促进收敛和概括的不同修剪方法。特别是，我们将任务划分为免费和接触率丰富的子任务，在笛卡尔而不是关节空间中执行控制，并参数化控制策略。这些修剪方法自然是在动态运动原语（DMP）的框架内实现的。为了处理接触率丰富的任务，我们通过引入像人手腕一样的耦合术语来扩展DMP框架，并在与环境接触下提供积极的合规性。我们证明，所提出的方法可以学习插入技能，这些技能在空间，大小，形状和紧密相关的情况下不变，同时处理大型不确定性。最后，我们证明，学到的政策可以轻松地从模拟转移到现实世界，并在UR5E机器人上实现类似的性能。

In recent years, industrial robots have been installed in various industries to handle advanced manufacturing and high precision tasks. However, further integration of industrial robots is hampered by their limited flexibility, adaptability and decision making skills compared to human operators. Assembly tasks are especially challenging for robots since they are contact-rich and sensitive to even small uncertainties. While reinforcement learning (RL) offers a promising framework to learn contact-rich control policies from scratch, its applicability to high-dimensional continuous state-action spaces remains rather limited due to high brittleness and sample complexity. To address those issues, we propose different pruning methods that facilitate convergence and generalization. In particular, we divide the task into free and contact-rich sub-tasks, perform the control in Cartesian rather than joint space, and parameterize the control policy. Those pruning methods are naturally implemented within the framework of dynamic movement primitives (DMP). To handle contact-rich tasks, we extend the DMP framework by introducing a coupling term that acts like the human wrist and provides active compliance under contact with the environment. We demonstrate that the proposed method can learn insertion skills that are invariant to space, size, shape, and closely related scenarios, while handling large uncertainties. Finally we demonstrate that the learned policy can be easily transferred from simulations to real world and achieve similar performance on UR5e robot.

下载PDF全文

下载文献需遵守相关版权规定

论文标题