论文标题
PointContrast:3D点云理解的无监督预训练
PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding
论文作者
论文摘要
可以说,深度学习的最成功故事之一是转移学习。一旦在通常较小的目标集中进行微调,在丰富的来源集(例如,ImageNet)上预训练网络可以帮助提高性能,这一发现对语言和愿景中的许多应用都起了重要作用。然而,关于它在3D点云理解中的有用性知之甚少。考虑到3D中注释数据所需的努力,我们将其视为机会。在这项工作中,我们旨在促进3D代表学习的研究。与以前的工作不同,我们专注于高级场景理解任务。为此,我们选择了一套不同的数据集和任务,以衡量无监督的预训练对大型3D场景集合集的影响。我们的发现极为令人鼓舞:使用统一的架构,源数据集和对比度损失的三联体进行预训练,我们在6种不同基准测试和检测的最佳最佳结果中,用于室内和室外,室外和合成数据集的最佳最佳结果 - 表明,学识渊博的表示可以在整个域中广泛化。此外,改进类似于受监督的预培训,这表明未来的努力应该偏爱扩展数据收集,而不是更详细的注释。我们希望这些发现将鼓励对3D深度学习的无监督借口任务设计进行更多研究。
Arguably one of the top success stories of deep learning is transfer learning. The finding that pre-training a network on a rich source set (eg., ImageNet) can help boost performance once fine-tuned on a usually much smaller target set, has been instrumental to many applications in language and vision. Yet, very little is known about its usefulness in 3D point cloud understanding. We see this as an opportunity considering the effort required for annotating data in 3D. In this work, we aim at facilitating research on 3D representation learning. Different from previous works, we focus on high-level scene understanding tasks. To this end, we select a suite of diverse datasets and tasks to measure the effect of unsupervised pre-training on a large source set of 3D scenes. Our findings are extremely encouraging: using a unified triplet of architecture, source dataset, and contrastive loss for pre-training, we achieve improvement over recent best results in segmentation and detection across 6 different benchmarks for indoor and outdoor, real and synthetic datasets -- demonstrating that the learned representation can generalize across domains. Furthermore, the improvement was similar to supervised pre-training, suggesting that future efforts should favor scaling data collection over more detailed annotation. We hope these findings will encourage more research on unsupervised pretext task design for 3D deep learning.