论文标题
Metafuse:人姿势估计的预训练融合模型
MetaFuse: A Pre-trained Fusion Model for Human Pose Estimation
论文作者
论文摘要
横视特征融合是解决人类姿势估计中闭塞问题的关键。当前的融合方法需要为每对相机训练一个单独的型号,从而使它们难以扩展。在这项工作中,我们介绍了Metafuse,这是一种从Panoptic数据集中的大量相机中学到的预训练的融合模型。该模型可以使用少量标记的图像有效地适应新的相机。 Metafuse的强大适应能力在很大程度上是由于原始融合模型的提议分解为两个部分(1)所有相机共享的通用融合模型,以及(2)轻量级摄像机依赖性转换。此外,通过元学习样式算法从许多相机中学到了通用模型,以最大程度地提高其对各种相机姿势的适应能力。我们在实验中观察到,在公共数据集上进行的固定限制的限制优于最先进的利润率,从而在实践中验证其价值。
Cross view feature fusion is the key to address the occlusion problem in human pose estimation. The current fusion methods need to train a separate model for every pair of cameras making them difficult to scale. In this work, we introduce MetaFuse, a pre-trained fusion model learned from a large number of cameras in the Panoptic dataset. The model can be efficiently adapted or finetuned for a new pair of cameras using a small number of labeled images. The strong adaptation power of MetaFuse is due in large part to the proposed factorization of the original fusion model into two parts (1) a generic fusion model shared by all cameras, and (2) lightweight camera-dependent transformations. Furthermore, the generic model is learned from many cameras by a meta-learning style algorithm to maximize its adaptation capability to various camera poses. We observe in experiments that MetaFuse finetuned on the public datasets outperforms the state-of-the-arts by a large margin which validates its value in practice.