Imvotenet：用图像投票在点云中增强3D对象检测

论文标题

Imvotenet：用图像投票在点云中增强3D对象检测

ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes

论文作者

Qi, Charles R., Chen, Xinlei, Litany, Or, Guibas, Leonidas J.

论文摘要

由于对点云的深度学习进展，3D对象检测已取得了很快的进步。最近的一些作品甚至显示了最先进的性能，并具有仅仅是点云输入（例如投票）。但是，点云数据具有固有的局限性。它们稀疏，缺乏颜色信息，并且经常会遇到传感器噪音。另一方面，图像具有高分辨率和丰富的质感。因此，它们可以补充点云提供的3D几何形状。然而，如何有效地使用图像信息来协助基于点云的检测仍然是一个悬而未决的问题。在这项工作中，我们在投票基础之上建立，并提出了一个3D检测架构，称为IMVotenet，专门用于RGB-D场景。 Imvotenet基于图像中的2D票和3D票的融合。与先前的多模式检测工作相比，我们从2D图像中明确提取几何和语义特征。我们利用相机参数将这些功能提升到3D。为了改善2D-3D功能融合的协同作用，我们还提出了一个较高的训练方案。我们在具有挑战性的Sun RGB-D数据集上验证了模型，将最新结果提高了5.7地图。我们还提供丰富的消融研究，以分析每个设计选择的贡献。

3D object detection has seen quick progress thanks to advances in deep learning on point clouds. A few recent works have even shown state-of-the-art performance with just point clouds input (e.g. VoteNet). However, point cloud data have inherent limitations. They are sparse, lack color information and often suffer from sensor noise. Images, on the other hand, have high resolution and rich texture. Thus they can complement the 3D geometry provided by point clouds. Yet how to effectively use image information to assist point cloud based detection is still an open question. In this work, we build on top of VoteNet and propose a 3D detection architecture called ImVoteNet specialized for RGB-D scenes. ImVoteNet is based on fusing 2D votes in images and 3D votes in point clouds. Compared to prior work on multi-modal detection, we explicitly extract both geometric and semantic features from the 2D images. We leverage camera parameters to lift these features to 3D. To improve the synergy of 2D-3D feature fusion, we also propose a multi-tower training scheme. We validate our model on the challenging SUN RGB-D dataset, advancing state-of-the-art results by 5.7 mAP. We also provide rich ablation studies to analyze the contribution of each design choice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题