论文标题

部分可观测时空混沌系统的无模型预测

Object Manipulation via Visual Target Localization

论文作者

Ehsani, Kiana, Farhadi, Ali, Kembhavi, Aniruddha, Mottaghi, Roozbeh

论文摘要

对象操纵是与周围世界互动的体现AI代理所需的关键技能。培训代理操纵物体,提出了许多挑战。这些包括通过代理的手臂阻塞目标对象,嘈杂的对象检测和本地化,以及当代理在场景中移动时,目标经常出现。我们通过视觉对象位置估计(M-Vole)提出操纵,这种方法探索了寻找目标对象的环境,一旦将它们定位在其位置后计算其3D坐标,然后继续估算其3D位置,即使对象不可见,因此在整个剧集中都能有力地协助操纵这些对象的任务。我们的评估表明,成功率的3倍提高了一个模型,该模型可以访问相同的感觉套件,但在没有对象位置估计器的情况下进行了训练,我们的分析表明,我们的代理在深度感知和代理定位方面对噪声具有鲁棒性。重要的是,我们提出的方法放松了有关理想化的本地化和感知的几个假设,这些假设通常是由AI I体现的最新作品所采用的,这是迈向现实世界中对象操纵的培训代理的重要一步。

Object manipulation is a critical skill required for Embodied AI agents interacting with the world around them. Training agents to manipulate objects, poses many challenges. These include occlusion of the target object by the agent's arm, noisy object detection and localization, and the target frequently going out of view as the agent moves around in the scene. We propose Manipulation via Visual Object Location Estimation (m-VOLE), an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible, thus robustly aiding the task of manipulating these objects throughout the episode. Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite but is trained without the object location estimator, and our analysis shows that our agent is robust to noise in depth perception and agent localization. Importantly, our proposed approach relaxes several assumptions about idealized localization and perception that are commonly employed by recent works in embodied AI -- an important step towards training agents for object manipulation in the real world.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源