swformer：稀疏的窗口变压器用于点云中的3D对象检测

论文标题

swformer：稀疏的窗口变压器用于点云中的3D对象检测

SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds

论文作者

Sun, Pei, Tan, Mingxing, Wang, Weiyue, Liu, Chenxi, Xia, Fei, Leng, Zhaoqi, Anguelov, Dragomir

论文摘要

点云中的3D对象检测是现代机器人技术和自动驾驶系统的核心组件。 3D对象检测中的一个关键挑战来自3D场景中点占用的固有稀疏性质。在本文中，我们提出了稀疏的窗口变压器（Swformer），这是一个可扩展且准确的3D对象检测模型，可以充分利用点云的稀疏性。 Swformer构建在基于窗口的变压器的想法上，将3D点转换为稀疏的体素和窗口，然后使用铲斗方案有效地处理这些可变长度的稀疏窗口。除了每个空间窗口内的自我注意力外，我们的swformer还捕获了与多尺度特征融合和窗口换档操作的交叉窗口相关性。为了进一步解决从稀疏特征准确检测3D对象的独特挑战，我们提出了一种新的体素扩散技术。 Waymo打开数据集中的实验结果显示，我们的Swformer在官方测试集上的3D对象检测上实现了最新的73.36 L2 MAPH，在官方测试集上进行了3D对象检测，表现优于所有以前的单阶段和两个阶段模型，同时更有效。

3D object detection in point clouds is a core component for modern robotics and autonomous driving systems. A key challenge in 3D object detection comes from the inherent sparse nature of point occupancy within the 3D scene. In this paper, we propose Sparse Window Transformer (SWFormer ), a scalable and accurate model for 3D object detection, which can take full advantage of the sparsity of point clouds. Built upon the idea of window-based Transformers, SWFormer converts 3D points into sparse voxels and windows, and then processes these variable-length sparse windows efficiently using a bucketing scheme. In addition to self-attention within each spatial window, our SWFormer also captures cross-window correlation with multi-scale feature fusion and window shifting operations. To further address the unique challenge of detecting 3D objects accurately from sparse features, we propose a new voxel diffusion technique. Experimental results on the Waymo Open Dataset show our SWFormer achieves state-of-the-art 73.36 L2 mAPH on vehicle and pedestrian for 3D object detection on the official test set, outperforming all previous single-stage and two-stage models, while being much more efficient.

下载PDF全文

下载文献需遵守相关版权规定

论文标题