映射是对逼真的点网导航所需的吗？

论文标题

映射是对逼真的点网导航所需的吗？

Is Mapping Necessary for Realistic PointGoal Navigation?

论文作者

Partsey, Ruslan, Wijmans, Erik, Yokoyama, Naoki, Dobosevych, Oles, Batra, Dhruv, Maksymets, Oleksandr

论文摘要

自主代理可以在新环境中导航而不构建明确地图吗？对于PointGoal导航的任务（“转到$ΔX$，$ΔY$”），在理想化的设置（没有RGB-D和致动噪声，完美的GPS+Compass）下，答案是一个明显的“是” - 无地图神经模型，由任务 - agnostic组成组成，由训练有素的数据训练（CNNS和RNNS）训练有素，可以实现大规模稳固的培训，该模型已达到了大规模的良好努力。但是，对于在现实环境中的PointNav（RGB-D和驱动噪声，没有GPS+Compass），这是一个悬而未决的问题。我们在本文中解决了一个。该任务的最强成果是成功的71.7％。首先，我们确定了性能下降的主要原因：缺乏GPS+指南针。带有RGB-D传感和致动噪声的完美GPS+指南针的代理商取得了99.8％的成功（Gibson-V2 Val）。这表明（解释模因）强大的视觉进程是我们对逼真的PointNav所需的全部。如果我们能够实现这一目标，我们可以忽略感应和致动噪声。以此为基础，我们将数据集和模型大小扩展，并开发无人宣传的数据启发技术来训练模型以进行视觉探测。我们在栖息地现实的PointNAV挑战方面的最新状态从71％降低到94％的成功（+23，31％相对）和53％至74％的SPL（+21，40％相对）。尽管我们的方法无法饱和或“求解”此数据集，但这种强大的改进与有希望的零射击SIM2REAL转移（到Locobot）相结合提供了与假设一致的证据，即即使在现实环境中，显式映射也不是必需的。

Can an autonomous agent navigate in a new environment without building an explicit map? For the task of PointGoal navigation ('Go to $Δx$, $Δy$') under idealized settings (no RGB-D and actuation noise, perfect GPS+Compass), the answer is a clear 'yes' - map-less neural models composed of task-agnostic components (CNNs and RNNs) trained with large-scale reinforcement learning achieve 100% Success on a standard dataset (Gibson). However, for PointNav in a realistic setting (RGB-D and actuation noise, no GPS+Compass), this is an open question; one we tackle in this paper. The strongest published result for this task is 71.7% Success. First, we identify the main (perhaps, only) cause of the drop in performance: the absence of GPS+Compass. An agent with perfect GPS+Compass faced with RGB-D sensing and actuation noise achieves 99.8% Success (Gibson-v2 val). This suggests that (to paraphrase a meme) robust visual odometry is all we need for realistic PointNav; if we can achieve that, we can ignore the sensing and actuation noise. With that as our operating hypothesis, we scale the dataset and model size, and develop human-annotation-free data-augmentation techniques to train models for visual odometry. We advance the state of art on the Habitat Realistic PointNav Challenge from 71% to 94% Success (+23, 31% relative) and 53% to 74% SPL (+21, 40% relative). While our approach does not saturate or 'solve' this dataset, this strong improvement combined with promising zero-shot sim2real transfer (to a LoCoBot) provides evidence consistent with the hypothesis that explicit mapping may not be necessary for navigation, even in a realistic setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题