论文标题
通过变压器和自我划分的深度和自我动机的预测
Forecasting of depth and ego-motion with transformers and self-supervision
论文作者
论文摘要
本文解决了深度和自我运动的端到端自我监督预测的问题。给定一系列原始图像,其目的是通过自我监督的光度损失预测几何和自我运动。该体系结构是使用卷积和变压器模块设计的。这利用了两个模块的好处:CNN的电感偏置和变压器的多头注意力,从而实现了丰富的时空表示,从而实现了准确的深度预测。先前的工作尝试使用多模式输入/输出使用有监督的地面真实数据来解决此问题,这是不实际的,因为需要一个大的注释数据集。或者,与先前的方法相比,本文仅使用自我监督的原始图像作为输入来预测深度和自我运动。该方法在KITTI数据集基准上表现出色,几个绩效标准甚至可以与先前的非预测的自我监督单眼深度推理方法相媲美。
This paper addresses the problem of end-to-end self-supervised forecasting of depth and ego motion. Given a sequence of raw images, the aim is to forecast both the geometry and ego-motion using a self supervised photometric loss. The architecture is designed using both convolution and transformer modules. This leverages the benefits of both modules: Inductive bias of CNN, and the multi-head attention of transformers, thus enabling a rich spatio-temporal representation that enables accurate depth forecasting. Prior work attempts to solve this problem using multi-modal input/output with supervised ground-truth data which is not practical since a large annotated dataset is required. Alternatively to prior methods, this paper forecasts depth and ego motion using only self-supervised raw images as input. The approach performs significantly well on the KITTI dataset benchmark with several performance criteria being even comparable to prior non-forecasting self-supervised monocular depth inference methods.