论文标题
dall-e-bot:将网络规模扩散模型引入机器人技术
DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics
论文作者
论文摘要
我们介绍了第一批探索机器人技术网络规模扩散模型的工作。 dall-e-bot首先推断出这些对象的文本描述,然后生成代表这些对象的天然,类似人类的布置的图像,并最终根据该目标图像物理地安排对象,从而使机器人能够重新排列对象。我们表明,使用DALL-E可以零射击,而无需任何进一步的示例安排,数据收集或培训。 Dall-e-Bot是完全自主的,并且由于Dall-E的Web尺度预训练,不限于预定的对象或场景。通过人类研究和客观指标,鼓励现实世界的结果表明,将网络尺度扩散模型整合到机器人管道中是可扩展,无监督的机器人学习的有前途的方向。
We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally physically arranging the objects according to that goal image. We show that this is possible zero-shot using DALL-E, without needing any further example arrangements, data collection, or training. DALL-E-Bot is fully autonomous and is not restricted to a pre-defined set of objects or scenes, thanks to DALL-E's web-scale pre-training. Encouraging real-world results, with both human studies and objective metrics, show that integrating web-scale diffusion models into robotics pipelines is a promising direction for scalable, unsupervised robot learning.