自动3D注释和对象检测的多模式变压器

论文标题

自动3D注释和对象检测的多模式变压器

Multimodal Transformer for Automatic 3D Annotation and Object Detection

论文作者

Liu, Chang, Qian, Xiaoyan, Huang, Binxiao, Qi, Xiaojuan, Lam, Edmund, Tan, Siew-Chong, Wong, Ngai

论文摘要

尽管为培训3D对象检测模型收集了越来越多的数据集，但在LiDar扫描上注释3D盒仍然需要大量的人为努力。为了自动化注释并促进各种自定义数据集的生产，我们建议端到端的多模式变压器（MTRANS）自动标签器，该自动标签机利用LiDAR扫描和图像从弱2D边界盒中生成精确的3D盒子注释。为了减轻阻碍现有自动标签者的普遍稀疏性问题，MTRAN通过基于2D图像信息生成新的3D点来致密稀疏点云。 Mtrans采用多任务设计，将前景/背景片段段呈现，使LIDAR POINT CLUENS致密，并同时回归3D框。实验结果验证了MTRAN对提高生成标签质量的有效性。通过丰富稀疏点云，我们的方法分别在Kitti中度和硬样品上获得了4.48 \％和4.03 \％更好的3D AP，而不是最先进的自动标签器。也可以扩展Mtrans以提高3D对象检测的准确性，从而在Kitti硬样品上产生了显着的89.45 \％AP。代码位于\ url {https://github.com/cliu2/mtrans}。

Despite a growing number of datasets being collected for training 3D object detection models, significant human effort is still required to annotate 3D boxes on LiDAR scans. To automate the annotation and facilitate the production of various customized datasets, we propose an end-to-end multimodal transformer (MTrans) autolabeler, which leverages both LiDAR scans and images to generate precise 3D box annotations from weak 2D bounding boxes. To alleviate the pervasive sparsity problem that hinders existing autolabelers, MTrans densifies the sparse point clouds by generating new 3D points based on 2D image information. With a multi-task design, MTrans segments the foreground/background, densifies LiDAR point clouds, and regresses 3D boxes simultaneously. Experimental results verify the effectiveness of the MTrans for improving the quality of the generated labels. By enriching the sparse point clouds, our method achieves 4.48\% and 4.03\% better 3D AP on KITTI moderate and hard samples, respectively, versus the state-of-the-art autolabeler. MTrans can also be extended to improve the accuracy for 3D object detection, resulting in a remarkable 89.45\% AP on KITTI hard samples. Codes are at \url{https://github.com/Cliu2/MTrans}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题