白日梦：物理机器人学习的世界模型

论文标题

白日梦：物理机器人学习的世界模型

DayDreamer: World Models for Physical Robot Learning

论文作者

Wu, Philipp, Escontrela, Alejandro, Hafner, Danijar, Goldberg, Ken, Abbeel, Pieter

论文摘要

为了解决复杂环境中的任务，机器人需要从经验中学习。深度强化学习是一种常见的机器人学习方法，但需要大量的反复试验才能学习，从而限制了其在物理世界中的部署。结果，机器人学习的许多进步都取决于模拟器。另一方面，模拟器内部的学习无法捕获现实世界的复杂性，容易出现模拟器的不准确性，而所产生的行为并不适应世界上的变化。 Dreamer算法最近通过在学习的世界模型中进行计划表现出了巨大的希望，可以从少量互动中学习，从而超过了视频游戏中的纯强化学习。学习一种世界模型来预测潜在行动的结果，可以在想象中进行计划，从而减少实际环境中所需的反复试验量。但是，尚不清楚Dreamer是否可以促进在物理机器人上更快地学习。在本文中，我们将Dreamer应用于4个机器人，以在线学习，直接在现实世界中，而无需模拟器。 Dreamer训练一个四足动物的机器人，从头开始，站起来，站起来，仅在1小时内就没有重置。然后，我们推动机器人，发现Dreamer在10分钟内适应以承受扰动或迅速翻身并站起来。在两个不同的机器人武器上，Dreamer学会了直接从相机图像和稀疏的奖励中挑选和放置多个对象，从而接近人类的性能。在车轮机器人上，Dreamer学会了纯粹是从相机图像中导航到目标位置，从而自动解决有关机器人方向的歧义。在所有实验中使用相同的超参数，我们发现Dreamer能够在现实世界中在线学习，建立强大的基线。我们释放我们的基础架构，用于世界模型在机器人学习中的未来应用。

To solve tasks in complex environments, robots need to learn from experience. Deep reinforcement learning is a common approach to robot learning but requires a large amount of trial and error to learn, limiting its deployment in the physical world. As a consequence, many advances in robot learning rely on simulators. On the other hand, learning inside of simulators fails to capture the complexity of the real world, is prone to simulator inaccuracies, and the resulting behaviors do not adapt to changes in the world. The Dreamer algorithm has recently shown great promise for learning from small amounts of interaction by planning within a learned world model, outperforming pure reinforcement learning in video games. Learning a world model to predict the outcomes of potential actions enables planning in imagination, reducing the amount of trial and error needed in the real environment. However, it is unknown whether Dreamer can facilitate faster learning on physical robots. In this paper, we apply Dreamer to 4 robots to learn online and directly in the real world, without simulators. Dreamer trains a quadruped robot to roll off its back, stand up, and walk from scratch and without resets in only 1 hour. We then push the robot and find that Dreamer adapts within 10 minutes to withstand perturbations or quickly roll over and stand back up. On two different robotic arms, Dreamer learns to pick and place multiple objects directly from camera images and sparse rewards, approaching human performance. On a wheeled robot, Dreamer learns to navigate to a goal position purely from camera images, automatically resolving ambiguity about the robot orientation. Using the same hyperparameters across all experiments, we find that Dreamer is capable of online learning in the real world, establishing a strong baseline. We release our infrastructure for future applications of world models to robot learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题