学习深度非常稀疏

论文标题

学习深度非常稀疏

Learning Depth With Very Sparse Supervision

论文作者

Loquercio, Antonio, Dosovitskiy, Alexey, Scaramuzza, Davide

论文摘要

本文受到自然智能代理人的惊人能力的激励，并受到心理学的启发，探讨了一种观念，即感知通过与环境的互动而与世界的3D属性结合在一起。进行深度估计的现有作品需要大量的带注释数据或某种形式的硬编码几何约束。本文探讨了一种学习深度感知的新方法，这都不需要这些。具体而言，我们训练专门的全球本地网络体系结构，可以通过与环境相互作用的机器人可用的东西：从极稀疏的深度测量到每个图像的单个像素。从一对连续图像中，我们提出的网络输出了观察者在图像和密集深度图之间运动的潜在表示。几个数据集的实验表明，即使仅对于图像像素之一，就可以使用地面真相时，提出的网络可以学习单眼密度深度估计，比最先进的方法更准确22.5％。我们认为，尽管这项工作具有科学的兴趣，但仍奠定了基础，从极度稀疏的监督中学习深度，这对于在严重带宽或感应限制下作用的所有机器人系统都很有价值。

Motivated by the astonishing capabilities of natural intelligent agents and inspired by theories from psychology, this paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment. Existing works for depth estimation require either massive amounts of annotated training data or some form of hard-coded geometrical constraint. This paper explores a new approach to learning depth perception requiring neither of those. Specifically, we train a specialized global-local network architecture with what would be available to a robot interacting with the environment: from extremely sparse depth measurements down to even a single pixel per image. From a pair of consecutive images, our proposed network outputs a latent representation of the observer's motion between the images and a dense depth map. Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches. We believe that this work, despite its scientific interest, lays the foundations to learn depth from extremely sparse supervision, which can be valuable to all robotic systems acting under severe bandwidth or sensing constraints.

下载PDF全文

下载文献需遵守相关版权规定

论文标题