论文标题
映射是对逼真的点网导航所需的吗?
Is Mapping Necessary for Realistic PointGoal Navigation?
论文作者
论文摘要
自主代理可以在新环境中导航而不构建明确地图吗? 对于PointGoal导航的任务(“转到$ΔX$,$ΔY$”),在理想化的设置(没有RGB-D和致动噪声,完美的GPS+Compass)下,答案是一个明显的“是” - 无地图神经模型,由任务 - agnostic组成组成,由训练有素的数据训练(CNNS和RNNS)训练有素,可以实现大规模稳固的培训,该模型已达到了大规模的良好努力。但是,对于在现实环境中的PointNav(RGB-D和驱动噪声,没有GPS+Compass),这是一个悬而未决的问题。我们在本文中解决了一个。该任务的最强成果是成功的71.7%。 首先,我们确定了性能下降的主要原因:缺乏GPS+指南针。带有RGB-D传感和致动噪声的完美GPS+指南针的代理商取得了99.8%的成功(Gibson-V2 Val)。这表明(解释模因)强大的视觉进程是我们对逼真的PointNav所需的全部。如果我们能够实现这一目标,我们可以忽略感应和致动噪声。 以此为基础,我们将数据集和模型大小扩展,并开发无人宣传的数据启发技术来训练模型以进行视觉探测。我们在栖息地现实的PointNAV挑战方面的最新状态从71%降低到94%的成功(+23,31%相对)和53%至74%的SPL(+21,40%相对)。尽管我们的方法无法饱和或“求解”此数据集,但这种强大的改进与有希望的零射击SIM2REAL转移(到Locobot)相结合提供了与假设一致的证据,即即使在现实环境中,显式映射也不是必需的。
Can an autonomous agent navigate in a new environment without building an explicit map? For the task of PointGoal navigation ('Go to $Δx$, $Δy$') under idealized settings (no RGB-D and actuation noise, perfect GPS+Compass), the answer is a clear 'yes' - map-less neural models composed of task-agnostic components (CNNs and RNNs) trained with large-scale reinforcement learning achieve 100% Success on a standard dataset (Gibson). However, for PointNav in a realistic setting (RGB-D and actuation noise, no GPS+Compass), this is an open question; one we tackle in this paper. The strongest published result for this task is 71.7% Success. First, we identify the main (perhaps, only) cause of the drop in performance: the absence of GPS+Compass. An agent with perfect GPS+Compass faced with RGB-D sensing and actuation noise achieves 99.8% Success (Gibson-v2 val). This suggests that (to paraphrase a meme) robust visual odometry is all we need for realistic PointNav; if we can achieve that, we can ignore the sensing and actuation noise. With that as our operating hypothesis, we scale the dataset and model size, and develop human-annotation-free data-augmentation techniques to train models for visual odometry. We advance the state of art on the Habitat Realistic PointNav Challenge from 71% to 94% Success (+23, 31% relative) and 53% to 74% SPL (+21, 40% relative). While our approach does not saturate or 'solve' this dataset, this strong improvement combined with promising zero-shot sim2real transfer (to a LoCoBot) provides evidence consistent with the hypothesis that explicit mapping may not be necessary for navigation, even in a realistic setting.