按照指示进行想象和达到视觉目标

论文标题

按照指示进行想象和达到视觉目标

Following Instructions by Imagining and Reaching Visual Goals

论文作者

Kanu, John, Dessalene, Eadom, Lin, Xiaomin, Fermuller, Cornelia, Aloimonos, Yiannis

论文摘要

虽然遵循教学的传统方法通常假设先前的语言和感知知识，但许多最近的强化学习中的作品（RL）提出了端到端的学习政策，通常是通过训练神经网络来将观察和说明的共同表示形式直接映射到行动中。在这项工作中，我们提出了一个新颖的框架，用于学习使用RL框架中的空间推理执行时间扩展任务，通过依次想象视觉目标并选择适当的动作以实现想象中的目标。我们的框架在原始像素图像上运行，不假定先前的语言或感知知识，并且可以通过内在动机和单个外部奖励信号测量任务完成学习。我们在模拟交互式3D环境中使用机器人组在两个环境中验证我们的方法。我们的方法的表现优于使用原始像素和地面状态的两个平面体系结构，以及在对象安排任务上具有地面真相状态的层次结构。

While traditional methods for instruction-following typically assume prior linguistic and perceptual knowledge, many recent works in reinforcement learning (RL) have proposed learning policies end-to-end, typically by training neural networks to map joint representations of observations and instructions directly to actions. In this work, we present a novel framework for learning to perform temporally extended tasks using spatial reasoning in the RL framework, by sequentially imagining visual goals and choosing appropriate actions to fulfill imagined goals. Our framework operates on raw pixel images, assumes no prior linguistic or perceptual knowledge, and learns via intrinsic motivation and a single extrinsic reward signal measuring task completion. We validate our method in two environments with a robot arm in a simulated interactive 3D environment. Our method outperforms two flat architectures with raw-pixel and ground-truth states, and a hierarchical architecture with ground-truth states on object arrangement tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题