在野外学习多视图的大规模3D语义分段

论文标题

在野外学习多视图的大规模3D语义分段

Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation

论文作者

Robert, Damien, Vallet, Bruno, Landrieu, Loic

论文摘要

3D语义分割的最新作品建议通过使用专用网络处理每种模式并将学习的2D特征投射到3D点上，从而利用图像和点云之间的协同作用。合并大规模点云和图像会引起几个挑战，例如在点和像素之间构建映射，以及在多个视图之间汇总特征。当前的方法需要网格重建或专门的传感器来恢复闭塞，并使用启发式方法选择和汇总可用的图像。相比之下，我们提出了一个可端到端的可训练的多视图聚合模型，该模型利用3D点的观看条件从任意位置拍摄的图像中合并特征。我们的方法可以将标准2D和3D网络结合在一起，并优于在有色点云和混合2D/3D网络上运行的3D模型，而无需进行着色，网格融化或真实的深度图。我们为S3DIS（74.7 MIOU 6倍）和Kitti-360（58.3 MIOU）设置了大型室内/室外语义细分的新最先进的。我们的完整管道可以在https://github.com/drprojects/deepviewagg上访问，并且仅需要原始的3D扫描以及一组图像和姿势。

Recent works on 3D semantic segmentation propose to exploit the synergy between images and point clouds by processing each modality with a dedicated network and projecting learned 2D features onto 3D points. Merging large-scale point clouds and images raises several challenges, such as constructing a mapping between points and pixels, and aggregating features between multiple views. Current methods require mesh reconstruction or specialized sensors to recover occlusions, and use heuristics to select and aggregate available images. In contrast, we propose an end-to-end trainable multi-view aggregation model leveraging the viewing conditions of 3D points to merge features from images taken at arbitrary positions. Our method can combine standard 2D and 3D networks and outperforms both 3D models operating on colorized point clouds and hybrid 2D/3D networks without requiring colorization, meshing, or true depth maps. We set a new state-of-the-art for large-scale indoor/outdoor semantic segmentation on S3DIS (74.7 mIoU 6-Fold) and on KITTI-360 (58.3 mIoU). Our full pipeline is accessible at https://github.com/drprojects/DeepViewAgg, and only requires raw 3D scans and a set of images and poses.

下载PDF全文

下载文献需遵守相关版权规定

论文标题