论文标题
P3DEPTH:单眼深度估计,具有分段平面性的先验性
P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior
论文作者
论文摘要
单眼深度估计对于场景理解和下游任务至关重要。我们专注于监督的设置,在该设置中,仅在培训时才可以使用地面真相深度。基于对真实3D场景的高规律性的知识,我们提出了一种方法,该方法学会了从共面像素中选择性地利用信息来改善预测深度的信息。特别是,我们引入了一个分段平面性,该分段平面性指出,对于每个像素,都有一个与前者共享相同平面3D表面的种子像素。在此之前的动机上,我们设计了一个具有两个主头的网络。第一个头部输出像素级平面系数,而第二个则输出一个识别种子像素位置的密集偏移矢量场。然后,使用种子像素的平面系数来预测每个位置的深度。由此产生的预测与从第一个头部的初始预测进行了自适应融合,该预测通过学习的信心来解释与精确的局部平面性的潜在偏差。由于所提出的模块的不同性,整个架构都是端到端训练的,并且学会了预测常规深度图,并在遮挡边界处具有锋利的边缘。对我们的方法的广泛评估表明,我们将新的最新技术设置为有监督的单眼估计,超过了NYU DEPTH-V2和KITTI GARG拆分的先前方法。我们的方法提供了深度图,从而产生了输入场景的合理3D重建。代码可用:https://github.com/syscv/p3depth
Monocular depth estimation is vital for scene understanding and downstream tasks. We focus on the supervised setup, in which ground-truth depth is available only at training time. Based on knowledge about the high regularity of real 3D scenes, we propose a method that learns to selectively leverage information from coplanar pixels to improve the predicted depth. In particular, we introduce a piecewise planarity prior which states that for each pixel, there is a seed pixel which shares the same planar 3D surface with the former. Motivated by this prior, we design a network with two heads. The first head outputs pixel-level plane coefficients, while the second one outputs a dense offset vector field that identifies the positions of seed pixels. The plane coefficients of seed pixels are then used to predict depth at each position. The resulting prediction is adaptively fused with the initial prediction from the first head via a learned confidence to account for potential deviations from precise local planarity. The entire architecture is trained end-to-end thanks to the differentiability of the proposed modules and it learns to predict regular depth maps, with sharp edges at occlusion boundaries. An extensive evaluation of our method shows that we set the new state of the art in supervised monocular depth estimation, surpassing prior methods on NYU Depth-v2 and on the Garg split of KITTI. Our method delivers depth maps that yield plausible 3D reconstructions of the input scenes. Code is available at: https://github.com/SysCV/P3Depth