论文标题
EFFISCENE:无监督的光流,深度,摄像头和运动分割的无监督关节学习的有效每金刚性推断
EffiScene: Efficient Per-Pixel Rigidity Inference for Unsupervised Joint Learning of Optical Flow, Depth, Camera Pose and Motion Segmentation
论文作者
论文摘要
本文通过共同学习四个低级视觉子任务:光流$ \ textbf {f} $,立体 - 数据$ \ textbf {d} $,相机姿势$ \ \ textbf {p} $ {p} $和动作textbff {s s s} $。我们的关键见解是,场景的刚度与对象运动和场景深度共享相同的固有几何结构。因此,可以通过共同耦合$ \ textbf {f} $,$ \ textbf {d} $和$ \ textbf {p} $共同推断出$ \ textbf {s} $的刚度。为此,我们提出了一个具有高效的关节刚性学习的新型场景流动框架,它超越了具有独立的辅助结构的现有管道。在Effiscene中,我们首先估算粗级别的光流和深度,然后通过透视图-UN $ - 点方法计算摄像头姿势。为了共同学习局部刚性,我们从运动(RFM)层设计了具有三个主要成分的新型刚性:\ emph {} {(i)}相关提取; \ emph {} {(ii)}边界学习;和\ emph {} {(iii)} outlier ublier。最终输出基于RFM的刚性地图$ M_R $融合。为了有效地训练Effiscene,两个新的损失$ \ MATHCAL {l} _ {BND} $和$ \ Mathcal {L} _ {unc} $旨在防止琐碎的解决方案和正规化流量边界的不连续性。 Extensive experiments on scene flow benchmark KITTI show that our method is effective and significantly improves the state-of-the-art approaches for all sub-tasks, i.e. optical flow ($5.19 \rightarrow 4.20$), depth estimation ($3.78 \rightarrow 3.46$), visual odometry ($0.012 \rightarrow 0.011$) and motion segmentation ($0.57 \rightarrow 0.62 $)。
This paper addresses the challenging unsupervised scene flow estimation problem by jointly learning four low-level vision sub-tasks: optical flow $\textbf{F}$, stereo-depth $\textbf{D}$, camera pose $\textbf{P}$ and motion segmentation $\textbf{S}$. Our key insight is that the rigidity of the scene shares the same inherent geometrical structure with object movements and scene depth. Hence, rigidity from $\textbf{S}$ can be inferred by jointly coupling $\textbf{F}$, $\textbf{D}$ and $\textbf{P}$ to achieve more robust estimation. To this end, we propose a novel scene flow framework named EffiScene with efficient joint rigidity learning, going beyond the existing pipeline with independent auxiliary structures. In EffiScene, we first estimate optical flow and depth at the coarse level and then compute camera pose by Perspective-$n$-Points method. To jointly learn local rigidity, we design a novel Rigidity From Motion (RfM) layer with three principal components: \emph{}{(i)} correlation extraction; \emph{}{(ii)} boundary learning; and \emph{}{(iii)} outlier exclusion. Final outputs are fused based on the rigid map $M_R$ from RfM at finer levels. To efficiently train EffiScene, two new losses $\mathcal{L}_{bnd}$ and $\mathcal{L}_{unc}$ are designed to prevent trivial solutions and to regularize the flow boundary discontinuity. Extensive experiments on scene flow benchmark KITTI show that our method is effective and significantly improves the state-of-the-art approaches for all sub-tasks, i.e. optical flow ($5.19 \rightarrow 4.20$), depth estimation ($3.78 \rightarrow 3.46$), visual odometry ($0.012 \rightarrow 0.011$) and motion segmentation ($0.57 \rightarrow 0.62$).