论文标题
Visual Achoes:通过回声定位学习空间图像表示
VisualEchoes: Spatial Image Representation Learning through Echolocation
论文作者
论文摘要
几种动物物种(例如,蝙蝠,海豚和鲸鱼),甚至人类也具有出色的进行回声定位的能力:一种用于感知空间布局并在世界上定位物体的生物声纳。我们探讨了回声中包含的空间提示,以及它们如何使需要空间推理的视觉任务受益。首先,我们在照片现实的3D室内场景环境中捕获回声响应。然后,我们提出了一个基于相互作用的新颖表示学习框架,该框架通过回声分配来学习有用的视觉特征。我们表明,学习的图像特征对于需要空间推理的多个下游视觉任务很有用---单眼深度估计,表面正常估计和视觉导航 - 结果比受到严格监督的预训练的结果可比甚至更好。我们的作品为体现代理人的表示形式学习开辟了一条新的途径,在这种方法中,监督来自与物理世界的互动。
Several animal species (e.g., bats, dolphins, and whales) and even visually impaired humans have the remarkable ability to perform echolocation: a biological sonar used to perceive spatial layout and locate objects in the world. We explore the spatial cues contained in echoes and how they can benefit vision tasks that require spatial reasoning. First we capture echo responses in photo-realistic 3D indoor scene environments. Then we propose a novel interaction-based representation learning framework that learns useful visual features via echolocation. We show that the learned image features are useful for multiple downstream vision tasks requiring spatial reasoning---monocular depth estimation, surface normal estimation, and visual navigation---with results comparable or even better than heavily supervised pre-training. Our work opens a new path for representation learning for embodied agents, where supervision comes from interacting with the physical world.