学会为视听导航设置路点

论文标题

学会为视听导航设置路点

Learning to Set Waypoints for Audio-Visual Navigation

论文作者

Chen, Changan, Majumder, Sagnik, Al-Halah, Ziad, Gao, Ruohan, Ramakrishnan, Santhosh Kumar, Grauman, Kristen

论文摘要

在视听导航中，代理商使用瞄准镜和声音智能地穿越复杂的，未绘制的3D环境以找到声源（例如，在另一个房间里响起电话）。现有模型学会以固定的代理运动粒度行动，并依靠音频观察的简单复发聚合。我们引入了一种强化学习方法，以通过两个关键的新颖元素进行视听导航：1）在导航策略中动态设置和端到端学习的路点，以及2）声音记忆，提供了一个声音记忆，提供了一个结构化的，空间扎根的记录，以了解代理商在移动时所听到的内容。这两个新想法都利用了音频和视觉数据的协同作用，以揭示未塑造空间的几何形状。我们展示了我们在现实世界3D场景的两个具有挑战性的数据集（副本和Matterport3D）上的方法。我们的模型通过大幅度的余量改善了艺术的状态，我们的实验表明，学习瞄准镜，声音和空间之间的联系对于视听导航至关重要。项目：http：//vision.cs.utexas.edu/projects/audio_visual_waypoints。

In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source (e.g., a phone ringing in another room). Existing models learn to act at a fixed granularity of agent motion and rely on simple recurrent aggregations of the audio observations. We introduce a reinforcement learning approach to audio-visual navigation with two key novel elements: 1) waypoints that are dynamically set and learned end-to-end within the navigation policy, and 2) an acoustic memory that provides a structured, spatially grounded record of what the agent has heard as it moves. Both new ideas capitalize on the synergy of audio and visual data for revealing the geometry of an unmapped space. We demonstrate our approach on two challenging datasets of real-world 3D scenes, Replica and Matterport3D. Our model improves the state of the art by a substantial margin, and our experiments reveal that learning the links between sights, sounds, and space is essential for audio-visual navigation. Project: http://vision.cs.utexas.edu/projects/audio_visual_waypoints.

下载PDF全文

下载文献需遵守相关版权规定

论文标题