从激光雷达点云中弱监督的3D对象检测

论文标题

从激光雷达点云中弱监督的3D对象检测

Weakly Supervised 3D Object Detection from Lidar Point Cloud

论文作者

Meng, Qinghao, Wang, Wenguan, Zhou, Tianfei, Shen, Jianbing, Van Gool, Luc, Dai, Dengxin

论文摘要

手动为训练高质量3D对象探测器的点云数据标签很费力。这项工作提出了一种弱监督的3D对象检测方法，只需要一组弱注释的场景，与一些精确标记的对象实例相关联。这是通过两阶段的建筑设计实现的。阶段1学会在弱监督下生成圆柱形对象建议，即，仅在鸟的视图场景上单击了对象的水平中心。第二阶段学会使用一些标记良好的对象实例来完善圆柱形建议，以获得立方体和置信度得分。我们的方法仅使用500个弱注释的场景和534个精确标记的车辆实例，可实现85-95％的效果，即当前顶级领先，完全监督的探测器的性能（该探测器需要3、712详尽而精确的注释场景，使用15、654实例）。更重要的是，借助我们精心设计的网络体系结构，我们训练有素的模型可以用作3D对象注释器，从而允许自动和主动工作模式。我们的模型生成的注释可用于训练其原始性能的94％以上（在手动标记的数据下）以上的3D对象检测器。我们的实验还显示了我们的模型在增加培训数据的情况下提高性能的潜力。上面的设计使我们的方法高度实用，并引入了新的机会学习3D对象检测，并减轻了注释负担。

It is laborious to manually label point cloud data for training high-quality 3D object detectors. This work proposes a weakly supervised approach for 3D object detection, only requiring a small set of weakly annotated scenes, associated with a few precisely labeled object instances. This is achieved by a two-stage architecture design. Stage-1 learns to generate cylindrical object proposals under weak supervision, i.e., only the horizontal centers of objects are click-annotated on bird's view scenes. Stage-2 learns to refine the cylindrical proposals to get cuboids and confidence scores, using a few well-labeled object instances. Using only 500 weakly annotated scenes and 534 precisely labeled vehicle instances, our method achieves 85-95% the performance of current top-leading, fully supervised detectors (which require 3, 712 exhaustively and precisely annotated scenes with 15, 654 instances). More importantly, with our elaborately designed network architecture, our trained model can be applied as a 3D object annotator, allowing both automatic and active working modes. The annotations generated by our model can be used to train 3D object detectors with over 94% of their original performance (under manually labeled data). Our experiments also show our model's potential in boosting performance given more training data. Above designs make our approach highly practical and introduce new opportunities for learning 3D object detection with reduced annotation burden.

下载PDF全文

下载文献需遵守相关版权规定

论文标题