学习密集的奖励，以实现接触式操纵任务

论文标题

学习密集的奖励，以实现接触式操纵任务

Learning Dense Rewards for Contact-Rich Manipulation Tasks

论文作者

Wu, Zheng, Lian, Wenzhao, Unhelkar, Vaibhav, Tomizuka, Masayoshi, Schaal, Stefan

论文摘要

奖励在加强学习中起着至关重要的作用。为了达成所需的政策，合适的奖励功能的设计通常需要重要的领域专业知识以及反复试验。在这里，我们的目标是最大程度地减少为设计奖励功能而涉及的努力，以实现接触式操纵任务。特别是，我们提供了一种能够从机器人的高维观测值（例如图像和触觉反馈）中从机器人的高维观测中提取密集奖励功能的方法。与最先进的高维奖励学习方法相反，我们的方法不利用对抗性训练，因此不太容易发生相关的培训不稳定性。取而代之的是，我们的方法通过以自我监督的方式估算任务进度来学习奖励。我们证明了方法对两项接触式操纵任务的有效性和效率，即插孔和USB插入。实验结果表明，与基准相比，接受学习奖励功能的训练的策略可实现更好的性能和更快的收敛性。

Rewards play a crucial role in reinforcement learning. To arrive at the desired policy, the design of a suitable reward function often requires significant domain expertise as well as trial-and-error. Here, we aim to minimize the effort involved in designing reward functions for contact-rich manipulation tasks. In particular, we provide an approach capable of extracting dense reward functions algorithmically from robots' high-dimensional observations, such as images and tactile feedback. In contrast to state-of-the-art high-dimensional reward learning methodologies, our approach does not leverage adversarial training, and is thus less prone to the associated training instabilities. Instead, our approach learns rewards by estimating task progress in a self-supervised manner. We demonstrate the effectiveness and efficiency of our approach on two contact-rich manipulation tasks, namely, peg-in-hole and USB insertion. The experimental results indicate that the policies trained with the learned reward function achieves better performance and faster convergence compared to the baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题