论文标题
结构限制:使用看不见的对象的语言引导创建物理播种结构
StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects
论文作者
论文摘要
在人类环境中运行的机器人必须能够将对象重新排列为语义上的配置,即使这些对象以前是看不见的。在这项工作中,我们专注于在没有逐步说明的情况下建立身体上的结构的问题。我们提出了结构限制,该结构将扩散模型和以对象为中心的变压器结合到给定部分视图点云和高级语言目标的结构,例如“设置表”。我们的方法可以使用一种模型执行多个具有挑战性的语言条件的多步3D计划任务。在现有的对特定结构训练的现有多模式变压器模型中,结构扩散甚至提高了从看不见的对象中组装物理播种结构的成功率。我们在模拟和现实重新安排任务中显示了对持有对象的实验。重要的是,我们展示了如何整合扩散模型和碰撞歧视模型,可以在重新排列以前未见的对象时改进与其他方法相比的概括。有关视频和其他结果,请参见我们的网站:https://sstructdiffusion.github.io/。
Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures given partial-view point clouds and high-level language goals, such as "set the table". Our method can perform multiple challenging language-conditioned multi-step 3D planning tasks using one model. StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. Importantly, we show how integrating both a diffusion model and a collision-discriminator model allows for improved generalization over other methods when rearranging previously-unseen objects. For videos and additional results, see our website: https://structdiffusion.github.io/.