通过自我监管的长期建模学习单眼视觉进程

论文标题

通过自我监管的长期建模学习单眼视觉进程

Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling

论文作者

Zou, Yuliang, Ji, Pan, Tran, Quoc-Huy, Huang, Jia-Bin, Chandraker, Manmohan

论文摘要

在框架到框架姿势估计期间，单眼视觉探光仪（VO）严重受到误差的积累。在本文中，我们为VO提供了一种自我监督的学习方法，并特别考虑了较长序列的一致性。为此，我们使用具有两层卷积LSTM模块的姿势网络对姿势预测的长期依赖性进行建模。我们以纯粹的自我监督损失来训练网络，包括模仿几何vo中的循环闭合模块的循环一致性损失。受到先前的几何系统的启发，我们允许网络在训练过程中看到超出一个小的时间窗口，这是一本小说的损失，该损失结合了时间遥远（例如O（100））帧。给定GPU记忆约束，我们提出了一种阶段训练机制，其中第一阶段在本地时间窗口中运行，第二阶段在鉴于第一阶段特征的情况下，第二阶段以“全局”损失完善了姿势。我们在包括KITTI和TUM RGB-D在内的几个标准VO数据集上展示了竞争结果。

Monocular visual odometry (VO) suffers severely from error accumulation during frame-to-frame pose estimation. In this paper, we present a self-supervised learning method for VO with special consideration for consistency over longer sequences. To this end, we model the long-term dependency in pose prediction using a pose network that features a two-layer convolutional LSTM module. We train the networks with purely self-supervised losses, including a cycle consistency loss that mimics the loop closure module in geometric VO. Inspired by prior geometric systems, we allow the networks to see beyond a small temporal window during training, through a novel a loss that incorporates temporally distant (e.g., O(100)) frames. Given GPU memory constraints, we propose a stage-wise training mechanism, where the first stage operates in a local time window and the second stage refines the poses with a "global" loss given the first stage features. We demonstrate competitive results on several standard VO datasets, including KITTI and TUM RGB-D.

下载PDF全文

下载文献需遵守相关版权规定

论文标题