基础模型可以执行机器人操纵的零击任务规范吗？

论文标题

基础模型可以执行机器人操纵的零击任务规范吗？

Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?

论文作者

Cui, Yuchen, Niekum, Scott, Gupta, Abhinav, Kumar, Vikash, Rajeswaran, Aravind

论文摘要

任务规范是编程自主机器人的核心。对任务规范的低劳动方式对于非专家最终用户的参与和最终采用个性化机器人代理至关重要。一个广泛研究的任务规范方法是通过目标使用紧凑型状态向量或来自同一机器人场景的目标图像的目标。前者很难解释非专家，因此需要详细的状态估计和场景理解。后者需要产生所需的目标形象，这通常需要人类完成任务，击败拥有自主机器人的目的。在这项工作中，我们探讨了替代和更一般的目标规范形式，这些形式的指定和使用，例如从Internet获得的图像，可以视觉描述所需任务或简单语言描述的手法。作为朝着这一目标的初步步骤，我们研究了大规模预训练模型（基础模型）的零击目标规范的功能，并在集合模拟的机器人操作任务和现实世界数据集中找到了有希望的结果。

Task specification is at the core of programming autonomous robots. A low-effort modality for task specification is critical for engagement of non-expert end-users and ultimate adoption of personalized robot agents. A widely studied approach to task specification is through goals, using either compact state vectors or goal images from the same robot scene. The former is hard to interpret for non-experts and necessitates detailed state estimation and scene understanding. The latter requires the generation of desired goal image, which often requires a human to complete the task, defeating the purpose of having autonomous robots. In this work, we explore alternate and more general forms of goal specification that are expected to be easier for humans to specify and use such as images obtained from the internet, hand sketches that provide a visual description of the desired task, or simple language descriptions. As a preliminary step towards this, we investigate the capabilities of large scale pre-trained models (foundation models) for zero-shot goal specification, and find promising results in a collection of simulated robot manipulation tasks and real-world datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题