无监督的深度学习，光流和姿势并从3D几何学上遮挡

论文标题

无监督的深度学习，光流和姿势并从3D几何学上遮挡

Unsupervised Learning of Depth, Optical Flow and Pose with Occlusion from 3D Geometry

论文作者

Wang, Guangming, Zhang, Chi, Wang, Hesheng, Wang, Jingchuan, Wang, Yong, Wang, Xinlei

论文摘要

在自动驾驶中，单眼序列包含大量信息。最近，连续帧中的单眼深度估计，相机自我运动估计和光流估计最近是引人注目的关注点。通过分析上述任务，中间框架中的像素被建模为三个部分：刚性区域，非刚性区域和遮挡区域。在无监督的深度和姿势训练中，我们可以明确分割被遮挡的区域。闭塞信息用于无监督的深度，姿势和光流的学习，因为在遮挡区域内通过深度置态和光流进行重建的图像将无效。一个不太卑鄙的面具旨在进一步排除在深度和姿势网络训练中受到运动或照明变化的不匹配的像素。该方法还用于排除光流网络训练中的一些微不足道的像素。提出了最大的归一化，以限制无纹理区域的深度降解。在遮挡的区域中，由于深度和相机运动可以提供更可靠的运动估计，因此它们可用于指导无监督的光流学习。我们在KITTI数据集中的实验表明，基于三个区域的模型，即遮挡区域，刚性区域的完整和明确分割，具有相应无监督损失的非刚性区域可以显着提高三个任务的性能。源代码可在以下网址提供：https：//github.com/guangmingw/doplearning。

In autonomous driving, monocular sequences contain lots of information. Monocular depth estimation, camera ego-motion estimation and optical flow estimation in consecutive frames are high-profile concerns recently. By analyzing tasks above, pixels in the middle frame are modeled into three parts: the rigid region, the non-rigid region, and the occluded region. In joint unsupervised training of depth and pose, we can segment the occluded region explicitly. The occlusion information is used in unsupervised learning of depth, pose and optical flow, as the image reconstructed by depth-pose and optical flow will be invalid in occluded regions. A less-than-mean mask is designed to further exclude the mismatched pixels interfered with by motion or illumination change in the training of depth and pose networks. This method is also used to exclude some trivial mismatched pixels in the training of the optical flow network. Maximum normalization is proposed for depth smoothness term to restrain depth degradation in textureless regions. In the occluded region, as depth and camera motion can provide more reliable motion estimation, they can be used to instruct unsupervised learning of optical flow. Our experiments in KITTI dataset demonstrate that the model based on three regions, full and explicit segmentation of the occlusion region, the rigid region, and the non-rigid region with corresponding unsupervised losses can improve performance on three tasks significantly. The source code is available at: https://github.com/guangmingw/DOPlearning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题