3DIOUMATCH：利用半监督3D对象检测的IOU预测

论文标题

3DIOUMATCH：利用半监督3D对象检测的IOU预测

3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection

论文作者

Wang, He, Cong, Yezhen, Litany, Or, Gao, Yue, Guibas, Leonidas J.

论文摘要

3D对象检测是一项重要但苛刻的任务，在很大程度上依赖很难获得3D注释。为了减少所需的监督量，我们提出了3Dioumatch，这是一种适用于适用于室内和室外场景的3D对象检测的新型半监督方法。我们利用教师的共同学习框架来传播以伪标签形式的标签到未标记的火车的信息。但是，由于任务的复杂性很高，我们观察到伪标签遭受了巨大的噪音，因此无法直接使用。为此，我们引入了一种基于置信的过滤机制，灵感来自FixMatch。我们基于预测的对象和级别的概率来设置置信度阈值。虽然有效，但我们观察到这两项措施不能充分捕获本地化质量。因此，我们建议将估计的3D IOU用作本地化度量，并设置类别意识到的自我调整阈值来过滤局部较差的建议。在自动驾驶数据集Kitti上使用PV-RCNN时，我们在室内数据集上采用了投票机作为骨干检测器。我们的方法始终通过在所有标签比（包括完全标记的设置）下通过大幅度的边距来改善扫描仪和Sun-RGBD基准的最新方法。例如，当仅使用10 \％标记的数据训练时，3DIOUMATCH在[email protected]上实现了7.7％的绝对改进，而在先前的艺术方面，[email protected]的绝对改进为8.5％。在KITTI上，我们是第一个证明半监督的3D对象检测的人，我们的方法在不同的标签比和类别下超过了完全监督的基线。

3D object detection is an important yet demanding task that heavily relies on difficult to obtain 3D annotations. To reduce the required amount of supervision, we propose 3DIoUMatch, a novel semi-supervised method for 3D object detection applicable to both indoor and outdoor scenes. We leverage a teacher-student mutual learning framework to propagate information from the labeled to the unlabeled train set in the form of pseudo-labels. However, due to the high task complexity, we observe that the pseudo-labels suffer from significant noise and are thus not directly usable. To that end, we introduce a confidence-based filtering mechanism, inspired by FixMatch. We set confidence thresholds based upon the predicted objectness and class probability to filter low-quality pseudo-labels. While effective, we observe that these two measures do not sufficiently capture localization quality. We therefore propose to use the estimated 3D IoU as a localization metric and set category-aware self-adjusted thresholds to filter poorly localized proposals. We adopt VoteNet as our backbone detector on indoor datasets while we use PV-RCNN on the autonomous driving dataset, KITTI. Our method consistently improves state-of-the-art methods on both ScanNet and SUN-RGBD benchmarks by significant margins under all label ratios (including fully labeled setting). For example, when training using only 10\% labeled data on ScanNet, 3DIoUMatch achieves 7.7% absolute improvement on [email protected] and 8.5% absolute improvement on [email protected] upon the prior art. On KITTI, we are the first to demonstrate semi-supervised 3D object detection and our method surpasses a fully supervised baseline from 1.8% to 7.6% under different label ratios and categories.

下载PDF全文

下载文献需遵守相关版权规定

论文标题