从各种视频中绘制逆强化学习

论文标题

从各种视频中绘制逆强化学习

Graph Inverse Reinforcement Learning from Diverse Videos

论文作者

Kumar, Sateesh, Zamora, Jonathan, Hansen, Nicklas, Jangir, Rishabh, Wang, Xiaolong

论文摘要

第三人称视频的逆增强学习研究（IRL）的研究表明，令人鼓舞的结果是消除了对机器人任务的手动奖励设计的需求。但是，大多数先前的作品仍然受到相对受限域视频领域的培训的限制。在本文中，我们认为第三人称IRL的真正潜力在于增加视频的多样性以更好地扩展。为了从不同的视频中学习奖励功能，我们建议在视频中执行图形抽象，然后在图表空间中进行时间匹配，以衡量任务进度。我们的见解是，可以通过形成图形的实体交互来描述任务，并且该图抽象可以帮助消除无关的信息，例如纹理，从而产生更强大的奖励功能。我们评估了X魔术中的跨隔离学习和从人类示威中进行实体机器人操纵的方法，评估了我们的方法。我们对以前的方法表现出对各种视频演示的鲁棒性的显着改善，甚至比真正的机器人推动任务上的手动奖励设计更好。视频可在https://sateeshkumar21.github.io/graphirl上找到。

Research on Inverse Reinforcement Learning (IRL) from third-person videos has shown encouraging results on removing the need for manual reward design for robotic tasks. However, most prior works are still limited by training from a relatively restricted domain of videos. In this paper, we argue that the true potential of third-person IRL lies in increasing the diversity of videos for better scaling. To learn a reward function from diverse videos, we propose to perform graph abstraction on the videos followed by temporal matching in the graph space to measure the task progress. Our insight is that a task can be described by entity interactions that form a graph, and this graph abstraction can help remove irrelevant information such as textures, resulting in more robust reward functions. We evaluate our approach, GraphIRL, on cross-embodiment learning in X-MAGICAL and learning from human demonstrations for real-robot manipulation. We show significant improvements in robustness to diverse video demonstrations over previous approaches, and even achieve better results than manual reward design on a real robot pushing task. Videos are available at https://sateeshkumar21.github.io/GraphIRL .

下载PDF全文

下载文献需遵守相关版权规定

论文标题