视觉和语言导航的SIM模拟转移

论文标题

视觉和语言导航的SIM模拟转移

Sim-to-Real Transfer for Vision-and-Language Navigation

论文作者

Anderson, Peter, Shrivastava, Ayush, Truong, Joanne, Majumdar, Arjun, Parikh, Devi, Batra, Dhruv, Lee, Stefan

论文摘要

我们研究了在以前看不见的环境中释放机器人并遵循不受限制的自然语言导航指令的具有挑战性的问题。关于视觉和语言导航任务（VLN）的最新工作在模拟方面取得了重大进展。为了评估这项工作对机器人技术的含义，我们将培训的模拟训练的VLN代理转移到物理机器人中。为了弥合VLN代理所学到的高级离散动作空间与机器人的低级连续动作空间之间的差距，我们提出了一个子目标模型来识别附近的航路点，并使用域随机化来减轻视觉域的差异。为了在并行环境中进行准确的SIM卡和实际比较，我们注释了具有1.3公里导航说明的325M2办公空间，并在模拟中创建数字化复制品。我们发现，如果可以提前收集和注释占用图（成功率为46.8％，而SIM中的55.9％的成功率）在最困难的环境中，则没有事先映射（成功率为22.5％），则SIM到实现向未见的环境转移到训练中未见的环境是成功的。

We study the challenging problem of releasing a robot in a previously unseen environment, and having it follow unconstrained natural language navigation instructions. Recent work on the task of Vision-and-Language Navigation (VLN) has achieved significant progress in simulation. To assess the implications of this work for robotics, we transfer a VLN agent trained in simulation to a physical robot. To bridge the gap between the high-level discrete action space learned by the VLN agent, and the robot's low-level continuous action space, we propose a subgoal model to identify nearby waypoints, and use domain randomization to mitigate visual domain differences. For accurate sim and real comparisons in parallel environments, we annotate a 325m2 office space with 1.3km of navigation instructions, and create a digitized replica in simulation. We find that sim-to-real transfer to an environment not seen in training is successful if an occupancy map and navigation graph can be collected and annotated in advance (success rate of 46.8% vs. 55.9% in sim), but much more challenging in the hardest setting with no prior mapping at all (success rate of 22.5%).

下载PDF全文

下载文献需遵守相关版权规定

论文标题