指导驱动的机器人操纵政策

论文标题

指导驱动的机器人操纵政策

Instruction-driven history-aware policies for robotic manipulations

论文作者

Guhur, Pierre-Louis, Chen, Shizhe, Garcia, Ricardo, Tapaswi, Makarand, Laptev, Ivan, Schmid, Cordelia

论文摘要

在人类环境中，预计在给定简单的自然语言指令的情况下，机器人将完成各种操纵任务。然而，机器人的操纵极具挑战性，因为它需要细粒度的运动控制，长期记忆以及对以前看不见的任务和环境的概括。为了应对这些挑战，我们提出了一种基于统一的变压器方法，该方法考虑了多个输入。特别是，我们的变压器体系结构集成了（i）自然语言指示和（ii）多视图场景观察，而（iii）跟踪观察和动作的完整历史。这种方法使历史和指示之间的学习依赖性可以使用多个视图提高操纵精度。我们在挑战性的RLBench基准和现实世界机器人方面评估了我们的方法。值得注意的是，我们的方法扩展到74个不同的RLBench任务，并超过了最新的现状。我们还解决了指导条件的任务，并证明了对以前看不见的变化的出色概括。

In human environments, robots are expected to accomplish a variety of manipulation tasks given simple natural language instructions. Yet, robotic manipulation is extremely challenging as it requires fine-grained motor control, long-term memory as well as generalization to previously unseen tasks and environments. To address these challenges, we propose a unified transformer-based approach that takes into account multiple inputs. In particular, our transformer architecture integrates (i) natural language instructions and (ii) multi-view scene observations while (iii) keeping track of the full history of observations and actions. Such an approach enables learning dependencies between history and instructions and improves manipulation precision using multiple views. We evaluate our method on the challenging RLBench benchmark and on a real-world robot. Notably, our approach scales to 74 diverse RLBench tasks and outperforms the state of the art. We also address instruction-conditioned tasks and demonstrate excellent generalization to previously unseen variations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题