稀疏的跨尺度注意力网络，用于有效的LIDAR PANOPTIT分割

论文标题

稀疏的跨尺度注意力网络，用于有效的LIDAR PANOPTIT分割

Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic Segmentation

论文作者

Xu, Shuangjie, Wan, Rui, Ye, Maosheng, Zou, Xiaoyi, Cao, Tongyi

论文摘要

3D LiDAR PANOPTIC分割（PS）的两个主要挑战是，对象的点云是表面聚集的，因此很难对远程依赖性进行建模，尤其是对于大型实例，并且对象太近彼此分开。最近的文献通过耗时的分组过程（例如双聚类，平均换档偏移等）或通过鸟眼视图（BEV）密集的质心表示，从而解决了这些问题，从而解决了这些问题。但是，远程几何关系尚未通过从上述方法中进行局部特征学习来充分建模。为此，我们提出了扫描，这是一个新型的稀疏跨尺度注意力网络，首先将多尺度的稀疏特征与全局素编码的注意力相结合，以捕获实例上下文的远距离关系，从而可以提高过度分割的大对象的回归准确性。对于表面聚集的点，扫描采用了实例质心的新型稀疏类不足的表示形式，它不仅可以保持对齐特征的稀疏性来解决小物体上的分割不足，还可以通过稀疏卷积减少网络的计算量。我们的方法在Semantickitti数据集中大量优于先前的方法，用于具有挑战性的3D PS任务，并以实时推理速度获得第一名。

Two major challenges of 3D LiDAR Panoptic Segmentation (PS) are that point clouds of an object are surface-aggregated and thus hard to model the long-range dependency especially for large instances, and that objects are too close to separate each other. Recent literature addresses these problems by time-consuming grouping processes such as dual-clustering, mean-shift offsets, etc., or by bird-eye-view (BEV) dense centroid representation that downplays geometry. However, the long-range geometry relationship has not been sufficiently modeled by local feature learning from the above methods. To this end, we present SCAN, a novel sparse cross-scale attention network to first align multi-scale sparse features with global voxel-encoded attention to capture the long-range relationship of instance context, which can boost the regression accuracy of the over-segmented large objects. For the surface-aggregated points, SCAN adopts a novel sparse class-agnostic representation of instance centroids, which can not only maintain the sparsity of aligned features to solve the under-segmentation on small objects, but also reduce the computation amount of the network through sparse convolution. Our method outperforms previous methods by a large margin in the SemanticKITTI dataset for the challenging 3D PS task, achieving 1st place with a real-time inference speed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题