P2P：使用点对像素提示来调整用于点云分析的预训练的图像模型

论文标题

P2P：使用点对像素提示来调整用于点云分析的预训练的图像模型

P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting

论文作者

Wang, Ziyi, Yu, Xumin, Rao, Yongming, Zhou, Jie, Lu, Jiwen

论文摘要

如今，大规模数据集的大型训练大型模型已成为深度学习的关键主题。具有较高表示能力和可传递性的预训练模型取得了巨大的成功，并在自然语言处理和2D视觉中占据了许多下游任务。但是，鉴于有限的训练数据相对不方便，因此将这种预处理的调整范式推广到3D视觉是不足的。在本文中，我们提供了一个新的观点，即利用3D域中的预训练的2D知识来解决此问题，并以新颖的点对像素来调整预训练的图像模型，以较小的参数成本提示点云分析。遵循促使工程学的原理，我们将点云转换为具有几何形状的投影和几何学吸引着色的色彩图像，以适应预训练的图像模型，在点云分析任务的端到端优化期间，其权重冻结。我们进行了广泛的实验，以证明与提议的点对像素提示合作，更好的预训练的图像模型将导致3D视觉中的性能始终如一。在图像前训练领域享受繁荣的发展，我们的方法在Scanobjectnn的最难设置上获得了89.3％的精度，超过了传统的点云模型，其可训练参数较少。我们的框架在模型网分类和造型部分分割方面还表现出非常具竞争力的性能。代码可在https://github.com/wangzy22/p2p上找到。

Nowadays, pre-training big models on large-scale datasets has become a crucial topic in deep learning. The pre-trained models with high representation ability and transferability achieve a great success and dominate many downstream tasks in natural language processing and 2D vision. However, it is non-trivial to promote such a pretraining-tuning paradigm to the 3D vision, given the limited training data that are relatively inconvenient to collect. In this paper, we provide a new perspective of leveraging pre-trained 2D knowledge in 3D domain to tackle this problem, tuning pre-trained image models with the novel Point-to-Pixel prompting for point cloud analysis at a minor parameter cost. Following the principle of prompting engineering, we transform point clouds into colorful images with geometry-preserved projection and geometry-aware coloring to adapt to pre-trained image models, whose weights are kept frozen during the end-to-end optimization of point cloud analysis tasks. We conduct extensive experiments to demonstrate that cooperating with our proposed Point-to-Pixel Prompting, better pre-trained image model will lead to consistently better performance in 3D vision. Enjoying prosperous development from image pre-training field, our method attains 89.3% accuracy on the hardest setting of ScanObjectNN, surpassing conventional point cloud models with much fewer trainable parameters. Our framework also exhibits very competitive performance on ModelNet classification and ShapeNet Part Segmentation. Code is available at https://github.com/wangzy22/P2P.

下载PDF全文

下载文献需遵守相关版权规定

论文标题