论文标题

WT-MVSNET:基于窗口的变压器多视图立体声

WT-MVSNet: Window-based Transformers for Multi-view Stereo

论文作者

Liao, Jinli, Ding, Yikang, Shavit, Yoli, Huang, Dihe, Ren, Shihao, Guo, Jia, Feng, Wensen, Zhang, Kai

论文摘要

最近,变形金刚通过启用远程特征相互作用来增强多视图立体声的性能。在这项工作中,我们提出了基于窗口的变压器(WT),以用于多视图立体声中的本地功能匹配和全局功能聚合。我们引入了一个基于窗口的表极变压器(湿),该变压器(湿)通过使用外聚约束来降低匹配的冗余。由于点对线匹配对错误的摄像头姿势和校准敏感,因此我们匹配附近的窗户。第二个转移的WT用于在成本量内汇总全球信息。我们提出了一种新颖的成本变压器(CT),以取代3D卷积以进行成本量正规化。为了更好地限制多个视图的估计深度图,我们进一步设计了一种新颖的几何损失(GEO损失),该损失(GEO损失)惩罚不满足多视图一致性的不可靠区域。我们的WT Multi-View立体声方法(WT-MVSNET)在多个数据集中实现了最新性能,并在Tanks和Semples基准上排名$ 1^{st} $。

Recently, Transformers were shown to enhance the performance of multi-view stereo by enabling long-range feature interaction. In this work, we propose Window-based Transformers (WT) for local feature matching and global feature aggregation in multi-view stereo. We introduce a Window-based Epipolar Transformer (WET) which reduces matching redundancy by using epipolar constraints. Since point-to-line matching is sensitive to erroneous camera pose and calibration, we match windows near the epipolar lines. A second Shifted WT is employed for aggregating global information within cost volume. We present a novel Cost Transformer (CT) to replace 3D convolutions for cost volume regularization. In order to better constrain the estimated depth maps from multiple views, we further design a novel geometric consistency loss (Geo Loss) which punishes unreliable areas where multi-view consistency is not satisfied. Our WT multi-view stereo method (WT-MVSNet) achieves state-of-the-art performance across multiple datasets and ranks $1^{st}$ on Tanks and Temples benchmark.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源