MVSTER：具有高效多视图立体声的外聚型变压器

论文标题

MVSTER：具有高效多视图立体声的外聚型变压器

MVSTER: Epipolar Transformer for Efficient Multi-View Stereo

论文作者

Wang, Xiaofeng, Zhu, Zheng, Qin, Fangbo, Ye, Yun, Huang, Guan, Chi, Xu, He, Yijia, Wang, Xingang

论文摘要

基于学习的多视图立体声（MVS）方法将源图像扭曲到参考摄像头的frustum中，以形成3D卷，这些卷被融合为要通过后续网络正规化的成本量。融合步骤在桥接2D语义和3D空间关联中起着至关重要的作用。但是，以前的方法利用额外的网络将2D信息作为融合线索，使用3D空间相关性并带来额外的计算成本。因此，我们提出了MVSTER，该MVSTER利用所提出的Epolar Transferser有效地学习2D语义和3D空间关联。具体而言，外两极变压器利用可分离的单眼深度估计器来增强2D语义，并使用交叉注意来构建沿着外两极线的数据依赖性3D关联。此外，MVSTER建立在级联结构中，在该结构中，利用了熵调查的最佳运输以在每个阶段传播更细的深度估计。广泛的实验表明，MVSTER具有明显更高效率的最先进的重建性能：与MVSNet和CasMVSNet相比，我们的MVSTER在DTU基准方面可实现34％和14％的相对改善，其中80％和51％的相对跑步时间相对减少。在所有已发表的作品中，Mvster在坦克和寺庙上也排名第一。代码在https://github.com/jeffwang987上发布。

Learning-based Multi-View Stereo (MVS) methods warp source images into the reference camera frustum to form 3D volumes, which are fused as a cost volume to be regularized by subsequent networks. The fusing step plays a vital role in bridging 2D semantics and 3D spatial associations. However, previous methods utilize extra networks to learn 2D information as fusing cues, underusing 3D spatial correlations and bringing additional computation costs. Therefore, we present MVSTER, which leverages the proposed epipolar Transformer to learn both 2D semantics and 3D spatial associations efficiently. Specifically, the epipolar Transformer utilizes a detachable monocular depth estimator to enhance 2D semantics and uses cross-attention to construct data-dependent 3D associations along epipolar line. Additionally, MVSTER is built in a cascade structure, where entropy-regularized optimal transport is leveraged to propagate finer depth estimations in each stage. Extensive experiments show MVSTER achieves state-of-the-art reconstruction performance with significantly higher efficiency: Compared with MVSNet and CasMVSNet, our MVSTER achieves 34% and 14% relative improvements on the DTU benchmark, with 80% and 51% relative reductions in running time. MVSTER also ranks first on Tanks&Temples-Advanced among all published works. Code is released at https://github.com/JeffWang987.

下载PDF全文

下载文献需遵守相关版权规定

论文标题