Selfodom：通过双向粗到细节恢复，自我监督的自我观察和深度学习

论文标题

Selfodom：通过双向粗到细节恢复，自我监督的自我观察和深度学习

SelfOdom: Self-supervised Egomotion and Depth Learning via Bi-directional Coarse-to-Fine Scale Recovery

论文作者

Qu, Hao, Zhang, Lilian, Hu, Xiaoping, He, Xiaofeng, Pan, Xianfei, Chen, Changhao

论文摘要

准确地感知位置和场景对于自动驾驶和移动机器人至关重要。深度学习的最新进展使得以自我监督的方式从单眼图像中学习自负和深度成为可能，而无需高度精确的标签来训练网络。然而，单眼视力方法的限制被称为尺度 - 模糊性，这在必要时限制了它们的应用。为了解决这个问题，我们提出了一个自我监督的双网络框架，可以从单眼图像中稳健而稳定地学习并生成全球尺度上的姿势和深度估计。特别是，我们引入了一种新颖的粗到精细训练策略，使度量标准量表能够在两个阶段的过程中恢复。此外，SelfoDom是灵活的，可以使用基于注意的融合模块将惯性数据与图像结合起来，从而改善了其在挑战的场景中的鲁棒性。我们的模型在正常和具有挑战性的照明条件下都擅长，包括困难的夜景。在公共数据集上进行的广泛实验表明，SelfoDom优于代表性的传统和基于学习的VO和VIO模型。

Accurately perceiving location and scene is crucial for autonomous driving and mobile robots. Recent advances in deep learning have made it possible to learn egomotion and depth from monocular images in a self-supervised manner, without requiring highly precise labels to train the networks. However, monocular vision methods suffer from a limitation known as scale-ambiguity, which restricts their application when absolute-scale is necessary. To address this, we propose SelfOdom, a self-supervised dual-network framework that can robustly and consistently learn and generate pose and depth estimates in global scale from monocular images. In particular, we introduce a novel coarse-to-fine training strategy that enables the metric scale to be recovered in a two-stage process. Furthermore, SelfOdom is flexible and can incorporate inertial data with images, which improves its robustness in challenging scenarios, using an attention-based fusion module. Our model excels in both normal and challenging lighting conditions, including difficult night scenes. Extensive experiments on public datasets have demonstrated that SelfOdom outperforms representative traditional and learning-based VO and VIO models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题