视觉重排的简单方法：3D映射和语义搜索

论文标题

视觉重排的简单方法：3D映射和语义搜索

A Simple Approach for Visual Rearrangement: 3D Mapping and Semantic Search

论文作者

Trabucco, Brandon, Sigurdsson, Gunnar, Piramuthu, Robinson, Sukhatme, Gaurav S., Salakhutdinov, Ruslan

论文摘要

物理重新排列的物体是体现剂的重要功能。视觉室的重排评估了代理在房间中重新安排对象的能力，仅基于视觉输入而获得所需的目标。我们为此问题提出了一种简单而有效的方法：（1）搜索并映射哪些对象需要重新排列，（2）重新排列每个对象，直到任务完成为止。我们的方法包括一个现成的语义分割模型，基于体素的语义图和语义搜索策略，以有效地找到需要重新排列的对象。在AI2-重新排列的挑战中，我们的方法改进了目前的最新端到端强化学习方法，这些方法从0.53％的正确重排达到16.56％，仅使用2.7％的环境样本，将视觉重排策略从0.53％的正确重排达到16.56％。

Physically rearranging objects is an important capability for embodied agents. Visual room rearrangement evaluates an agent's ability to rearrange objects in a room to a desired goal based solely on visual input. We propose a simple yet effective method for this problem: (1) search for and map which objects need to be rearranged, and (2) rearrange each object until the task is complete. Our approach consists of an off-the-shelf semantic segmentation model, voxel-based semantic map, and semantic search policy to efficiently find objects that need to be rearranged. On the AI2-THOR Rearrangement Challenge, our method improves on current state-of-the-art end-to-end reinforcement learning-based methods that learn visual rearrangement policies from 0.53% correct rearrangement to 16.56%, using only 2.7% as many samples from the environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题