在稀疏点云中利用更多信息3D单一对象跟踪

论文标题

在稀疏点云中利用更多信息3D单一对象跟踪

Exploiting More Information in Sparse Point Cloud for 3D Single Object Tracking

论文作者

Cui, Yubo, Shan, Jiayao, Gu, Zuoxu, Li, Zhiheng, Fang, Zheng

论文摘要

3D单一对象跟踪是3D计算机视觉中的关键任务。但是，点云的稀疏性使得很难计算相似性并找到对象，从而对3D跟踪器构成了巨大的挑战。以前的作品试图解决该问题并在某些常见场景中提高跟踪性能，但是在某些极端稀疏的场景中，它们通常会失败，例如用于长距离跟踪对象或部分遮挡的对象。为了解决上述问题，在这封信中，我们为3D单一对象跟踪提出了一个稀疏到密度和基于变压器的框架。首先，我们将3D稀疏点转换为3D支柱，然后将它们压缩为2D BEV特征以具有致密的表示。然后，我们提出了一个基于注意力的编码器，以实现模板和搜索分支之间的全局相似性计算，这可以减轻稀疏性的影响。同时，编码器将注意力应用于多尺度特征，以补偿由于点云的稀疏性和单一特征量表所引起的信息。最后，我们使用设定预测通过两阶段解码器跟踪对象，该解码器也利用了注意力。广泛的实验表明，我们的方法在Kitti和Nuscenes数据集上取得了非常有希望的结果。

3D single object tracking is a key task in 3D computer vision. However, the sparsity of point clouds makes it difficult to compute the similarity and locate the object, posing big challenges to the 3D tracker. Previous works tried to solve the problem and improved the tracking performance in some common scenarios, but they usually failed in some extreme sparse scenarios, such as for tracking objects at long distances or partially occluded. To address the above problems, in this letter, we propose a sparse-to-dense and transformer-based framework for 3D single object tracking. First, we transform the 3D sparse points into 3D pillars and then compress them into 2D BEV features to have a dense representation. Then, we propose an attention-based encoder to achieve global similarity computation between template and search branches, which could alleviate the influence of sparsity. Meanwhile, the encoder applies the attention on multi-scale features to compensate for the lack of information caused by the sparsity of point cloud and the single scale of features. Finally, we use set-prediction to track the object through a two-stage decoder which also utilizes attention. Extensive experiments show that our method achieves very promising results on the KITTI and NuScenes datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题