在野外估计以外部弱监督为中心的3D人类姿势

论文标题

在野外估计以外部弱监督为中心的3D人类姿势

Estimating Egocentric 3D Human Pose in the Wild with External Weak Supervision

论文作者

Wang, Jian, Liu, Lingjie, Xu, Weipeng, Sarkar, Kripasindhu, Luvizon, Diogo, Theobalt, Christian

论文摘要

以单一的鱼眼相机的估计为中心的3D姿势估计最近引起了大量关注。但是，现有的方法与野外图像的姿势估计很难，因为由于不可用大规模的野外egecentric数据集，它们只能对合成数据进行培训。此外，当身体部位被遮挡或与周围场景相互作用时，这些方法很容易失败。为了解决野外数据的短缺，我们收集了一个大规模的野外自我中心数据集，称为野外以自我为中心的姿势（EGOPW）。该数据集由头部安装的鱼眼摄像头和辅助外部摄像头捕获，该摄像头从训练期间的第三人称角度提供了对人体的额外观察。我们提出了一种新的中心姿势估计方法，可以在新数据集中对外部监督较弱进行培训。具体而言，我们首先通过合并外部视图监督，以时空优化方法为EGOPW数据集生成伪标签。然后，伪标签用于训练以自我为中心的姿势估计网络。为了促进网络培训，我们提出了一种新颖的学习策略，以监督以验证的外部视觉姿势估计模型提取的高质量特征来监督以自我为中心的特征。该实验表明，我们的方法可以预测单个中心的自我中心图像的准确3D姿势，并且在定量和定性上都超过了最先进的方法。

Egocentric 3D human pose estimation with a single fisheye camera has drawn a significant amount of attention recently. However, existing methods struggle with pose estimation from in-the-wild images, because they can only be trained on synthetic data due to the unavailability of large-scale in-the-wild egocentric datasets. Furthermore, these methods easily fail when the body parts are occluded by or interacting with the surrounding scene. To address the shortage of in-the-wild data, we collect a large-scale in-the-wild egocentric dataset called Egocentric Poses in the Wild (EgoPW). This dataset is captured by a head-mounted fisheye camera and an auxiliary external camera, which provides an additional observation of the human body from a third-person perspective during training. We present a new egocentric pose estimation method, which can be trained on the new dataset with weak external supervision. Specifically, we first generate pseudo labels for the EgoPW dataset with a spatio-temporal optimization method by incorporating the external-view supervision. The pseudo labels are then used to train an egocentric pose estimation network. To facilitate the network training, we propose a novel learning strategy to supervise the egocentric features with the high-quality features extracted by a pretrained external-view pose estimation model. The experiments show that our method predicts accurate 3D poses from a single in-the-wild egocentric image and outperforms the state-of-the-art methods both quantitatively and qualitatively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题