论文标题

稀疏的跨尺度注意力网络,用于有效的LIDAR PANOPTIT分割

Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic Segmentation

论文作者

Xu, Shuangjie, Wan, Rui, Ye, Maosheng, Zou, Xiaoyi, Cao, Tongyi

论文摘要

3D LiDAR PANOPTIC分割(PS)的两个主要挑战是,对象的点云是表面聚集的,因此很难对远程依赖性进行建模,尤其是对于大型实例,并且对象太近彼此分开。最近的文献通过耗时的分组过程(例如双聚类,平均换档偏移等)或通过鸟眼视图(BEV)密集的质心表示,从而解决了这些问题,从而解决了这些问题。但是,远程几何关系尚未通过从上述方法中进行局部特征学习来充分建模。为此,我们提出了扫描,这是一个新型的稀疏跨尺度注意力网络,首先将多尺度的稀疏特征与全局素编码的注意力相结合,以捕获实例上下文的远距离关系,从而可以提高过度分割的大对象的回归准确性。对于表面聚集的点,扫描采用了实例质心的新型稀疏类不足的表示形式,它不仅可以保持对齐特征的稀疏性来解决小物体上的分割不足,还可以通过稀疏卷积减少网络的计算量。我们的方法在Semantickitti数据集中大量优于先前的方法,用于具有挑战性的3D PS任务,并以实时推理速度获得第一名。

Two major challenges of 3D LiDAR Panoptic Segmentation (PS) are that point clouds of an object are surface-aggregated and thus hard to model the long-range dependency especially for large instances, and that objects are too close to separate each other. Recent literature addresses these problems by time-consuming grouping processes such as dual-clustering, mean-shift offsets, etc., or by bird-eye-view (BEV) dense centroid representation that downplays geometry. However, the long-range geometry relationship has not been sufficiently modeled by local feature learning from the above methods. To this end, we present SCAN, a novel sparse cross-scale attention network to first align multi-scale sparse features with global voxel-encoded attention to capture the long-range relationship of instance context, which can boost the regression accuracy of the over-segmented large objects. For the surface-aggregated points, SCAN adopts a novel sparse class-agnostic representation of instance centroids, which can not only maintain the sparsity of aligned features to solve the under-segmentation on small objects, but also reduce the computation amount of the network through sparse convolution. Our method outperforms previous methods by a large margin in the SemanticKITTI dataset for the challenging 3D PS task, achieving 1st place with a real-time inference speed.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源