论文标题
蒸馏语义,可从视频中获得全面的场景理解
Distilled Semantics for Comprehensive Scene Understanding from Videos
论文作者
论文摘要
对周围环境的全部理解对于自主系统至关重要。最近的作品表明,深层神经网络可以从单眼视频中学习几何(深度)和运动(光流),而无需从地面真相注释中进行任何明确的监督,尤其是难以为这两个任务提供。在本文中,我们通过学习深度和运动以及语义以及对后者对后者进行的,由预先训练的网络提取的代理地面真相图像对后者进行的监督,迈出了对单眼相机的整体场景理解。我们通过a)基于知识蒸馏和自学的新型培训协议共同解决这三个任务,b)紧凑的网络体系结构,可以有效地了解饥饿的GPU和低功率嵌入式平台。我们彻底评估框架的性能,并表明它为单眼深度估计,光流和运动分割产生最新的结果。
Whole understanding of the surroundings is paramount to autonomous systems. Recent works have shown that deep neural networks can learn geometry (depth) and motion (optical flow) from a monocular video without any explicit supervision from ground truth annotations, particularly hard to source for these two tasks. In this paper, we take an additional step toward holistic scene understanding with monocular cameras by learning depth and motion alongside with semantics, with supervision for the latter provided by a pre-trained network distilling proxy ground truth images. We address the three tasks jointly by a) a novel training protocol based on knowledge distillation and self-supervision and b) a compact network architecture which enables efficient scene understanding on both power hungry GPUs and low-power embedded platforms. We thoroughly assess the performance of our framework and show that it yields state-of-the-art results for monocular depth estimation, optical flow and motion segmentation.