dall-e-bot：将网络规模扩散模型引入机器人技术

论文标题

dall-e-bot：将网络规模扩散模型引入机器人技术

DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics

论文作者

Kapelyukh, Ivan, Vosylius, Vitalis, Johns, Edward

论文摘要

我们介绍了第一批探索机器人技术网络规模扩散模型的工作。 dall-e-bot首先推断出这些对象的文本描述，然后生成代表这些对象的天然，类似人类的布置的图像，并最终根据该目标图像物理地安排对象，从而使机器人能够重新排列对象。我们表明，使用DALL-E可以零射击，而无需任何进一步的示例安排，数据收集或培训。 Dall-e-Bot是完全自主的，并且由于Dall-E的Web尺度预训练，不限于预定的对象或场景。通过人类研究和客观指标，鼓励现实世界的结果表明，将网络尺度扩散模型整合到机器人管道中是可扩展，无监督的机器人学习的有前途的方向。

We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally physically arranging the objects according to that goal image. We show that this is possible zero-shot using DALL-E, without needing any further example arrangements, data collection, or training. DALL-E-Bot is fully autonomous and is not restricted to a pre-defined set of objects or scenes, thanks to DALL-E's web-scale pre-training. Encouraging real-world results, with both human studies and objective metrics, show that integrating web-scale diffusion models into robotics pipelines is a promising direction for scalable, unsupervised robot learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题