论文标题

走向多功能体现导航

Towards Versatile Embodied Navigation

论文作者

Wang, Hanqing, Liang, Wei, Van Gool, Luc, Wang, Wenguan

论文摘要

随着各种视觉导航任务的出现(例如,图像/对象/音频目标和视觉语言导航),以不同的方式指定目标,社区在培训能够很好地处理个人导航任务的专用代理方面取得了吸引力的进步。给定大量具有体现的导航任务和特定于任务的解决方案,我们解决了一个更基本的问题:我们可以学习一个强大的代理,而不是同时掌握多个导航任务?首先,我们提出了VXN,这是一个大规模的3D数据集,可以在标准化,连续和视听环境中实例化四个经典的导航任务。其次,我们提出了维也纳,这是一种多功能体现的导航代理,同时学习使用一个模型执行四个导航任务。维也纳以全面的体系结构为基础,将各种导航任务制定为一个统一的,分析和问题的过程:具有四个任务嵌入的目标描述,被全面解释为一组多元化的目标向量,这些目标向量被改进为导航进度,并用作质疑的质疑,以取得质疑,以求助于情节制定的情境制定。这使得通过不同的输入域/模式的导航任务重复使用知识。我们从经验上证明,与单独学习每个视觉导航任务相比,我们的多任务代理可以通过降低复杂性实现可比甚至更好的性能。

With the emergence of varied visual navigation tasks (e.g, image-/object-/audio-goal and vision-language navigation) that specify the target in different ways, the community has made appealing advances in training specialized agents capable of handling individual navigation tasks well. Given plenty of embodied navigation tasks and task-specific solutions, we address a more fundamental question: can we learn a single powerful agent that masters not one but multiple navigation tasks concurrently? First, we propose VXN, a large-scale 3D dataset that instantiates four classic navigation tasks in standardized, continuous, and audiovisual-rich environments. Second, we propose Vienna, a versatile embodied navigation agent that simultaneously learns to perform the four navigation tasks with one model. Building upon a full-attentive architecture, Vienna formulates various navigation tasks as a unified, parse-and-query procedure: the target description, augmented with four task embeddings, is comprehensively interpreted into a set of diversified goal vectors, which are refined as the navigation progresses, and used as queries to retrieve supportive context from episodic history for decision making. This enables the reuse of knowledge across navigation tasks with varying input domains/modalities. We empirically demonstrate that, compared with learning each visual navigation task individually, our multitask agent achieves comparable or even better performance with reduced complexity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源