MPT：用于人类姿势和网格重建的变压器预训练的网格训练

论文标题

MPT：用于人类姿势和网格重建的变压器预训练的网格训练

MPT: Mesh Pre-Training with Transformers for Human Pose and Mesh Reconstruction

论文作者

Lin, Kevin, Lin, Chung-Ching, Liang, Lin, Liu, Zicheng, Wang, Lijuan

论文摘要

从单个图像重建3D人姿势和网格的传统方法依赖于配对的图像网格数据集，这可能很难且昂贵。由于这种限制，模型可伸缩性以及重建性能受到限制。为了应对挑战，我们引入了网格预训练（MPT），这是一种有效的培训策略，利用大量的MOCAP数据在大规模上进行预训练。我们介绍了MoCap生成的热图作为网格回归变压器的输入表示形式，并提出了一种掩盖的热图建模方法，以改善预训练性能。这项研究表明，使用拟议的MPT进行预训练可以使我们的模型在不需要微调的情况下执行有效推理。我们进一步表明，对预训练的MPT模型进行微调大大提高了单个图像中人类网格重建的准确性。实验结果表明，MPT的表现优于人类36M和3DPW数据集的先前最新方法。作为进一步的应用，我们基于3D手重建的任务进行了基准和研究MPT，这表明我们的通用预训练方案可以很好地概括到手动姿势估计并实现有希望的重建性能。

Traditional methods of reconstructing 3D human pose and mesh from single images rely on paired image-mesh datasets, which can be difficult and expensive to obtain. Due to this limitation, model scalability is constrained as well as reconstruction performance. Towards addressing the challenge, we introduce Mesh Pre-Training (MPT), an effective pre-training strategy that leverages large amounts of MoCap data to effectively perform pre-training at scale. We introduce the use of MoCap-generated heatmaps as input representations to the mesh regression transformer and propose a Masked Heatmap Modeling approach for improving pre-training performance. This study demonstrates that pre-training using the proposed MPT allows our models to perform effective inference without requiring fine-tuning. We further show that fine-tuning the pre-trained MPT model considerably improves the accuracy of human mesh reconstruction from single images. Experimental results show that MPT outperforms previous state-of-the-art methods on Human3.6M and 3DPW datasets. As a further application, we benchmark and study MPT on the task of 3D hand reconstruction, showing that our generic pre-training scheme generalizes well to hand pose estimation and achieves promising reconstruction performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题