从视频中进行摄像机重新定位和深度估算的无监督的同时学习

论文标题

从视频中进行摄像机重新定位和深度估算的无监督的同时学习

Unsupervised Simultaneous Learning for Camera Re-Localization and Depth Estimation from Video

论文作者

Taguchi, Shun, Hirose, Noriaki

论文摘要

我们提供了一个无监督的同时学习框架，用于从未标记的视频序列中进行单眼相机重新定位和深度估计的任务。单眼摄像机重新定位是指在已知环境中估算实例图像的绝对相机姿势的任务，在已知环境中进行了深入研究，以在GPS否决的环境中进行替代定位。在最近的作品中，相机重新定位方法通过从相机图像和相机姿势对的监督学习来训练。与以前的作品相反，我们提出了一个完全无监督的学习框架，以重新定位和深度估计，仅需要训练的单眼视频序列。在我们的框架中，我们训练两个网络，这些网络使用方向和每个图像的深度图估算场景坐标，然后将其组合起来以估算相机姿势。可以根据我们的循环封闭视图合成来最小化损耗函数的训练网络。在使用7片数据集的实验中，提出的方法的表现优于最先进的视觉大满贯，ORB-SLAM3的重新定位。我们的方法还优于训练有素的环境中最先进的单眼深度估计。

We present an unsupervised simultaneous learning framework for the task of monocular camera re-localization and depth estimation from unlabeled video sequences. Monocular camera re-localization refers to the task of estimating the absolute camera pose from an instance image in a known environment, which has been intensively studied for alternative localization in GPS-denied environments. In recent works, camera re-localization methods are trained via supervised learning from pairs of camera images and camera poses. In contrast to previous works, we propose a completely unsupervised learning framework for camera re-localization and depth estimation, requiring only monocular video sequences for training. In our framework, we train two networks that estimate the scene coordinates using directions and the depth map from each image which are then combined to estimate the camera pose. The networks can be trained through the minimization of loss functions based on our loop closed view synthesis. In experiments with the 7-scenes dataset, the proposed method outperformed the re-localization of the state-of-the-art visual SLAM, ORB-SLAM3. Our method also outperforms state-of-the-art monocular depth estimation in a trained environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题