论文标题
通过结构化世界模型的奇怪探索会产生零拍的对象操纵
Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation
论文作者
论文摘要
设计人工代理人是一个长期的梦想,可以通过内在动机有效地探索环境,这与孩子们的表演方式相似。尽管最新的内在动机增强学习(RL)的进步,但在物体操纵方案中的样本效率探索仍然是一个重大挑战,因为大多数相关信息都在于稀疏的代理对象和对象对象相互作用。在本文中,我们建议使用结构化的世界模型将关系电感偏差纳入控制循环中,以在组成多对象环境中实现样品效率和相互作用的富探索。通过计划未来的新颖性结构化世界模型,我们的方法会产生自由播放的行为,这些行为早期开始与对象交互,并随着时间的推移发展更复杂的行为。我们的方法不仅仅是使用模型来计算固有的奖励,我们的方法表明,良好模型和良好探索之间的自我增强周期也开辟了另一个途径:通过基于模型的计划,零射门通用到下游任务。在完全固有的任务探索阶段之后,我们的方法解决了诸如堆叠,翻转,拾取和地点之类的挑战性下游任务,并投掷,这些任务概括为看不见的数字和对象的安排,而无需任何其他培训。
It has been a long-standing dream to design artificial agents that explore their environment efficiently via intrinsic motivation, similar to how children perform curious free play. Despite recent advances in intrinsically motivated reinforcement learning (RL), sample-efficient exploration in object manipulation scenarios remains a significant challenge as most of the relevant information lies in the sparse agent-object and object-object interactions. In this paper, we propose to use structured world models to incorporate relational inductive biases in the control loop to achieve sample-efficient and interaction-rich exploration in compositional multi-object environments. By planning for future novelty inside structured world models, our method generates free-play behavior that starts to interact with objects early on and develops more complex behavior over time. Instead of using models only to compute intrinsic rewards, as commonly done, our method showcases that the self-reinforcing cycle between good models and good exploration also opens up another avenue: zero-shot generalization to downstream tasks via model-based planning. After the entirely intrinsic task-agnostic exploration phase, our method solves challenging downstream tasks such as stacking, flipping, pick & place, and throwing that generalizes to unseen numbers and arrangements of objects without any additional training.