论文标题
多个3D人类姿势假设的弱监督生成网络
Weakly Supervised Generative Network for Multiple 3D Human Pose Hypotheses
论文作者
论文摘要
由于缺失深度的固有歧义,来自单个图像的3D人姿势估计是一个反问题。以前的几项作品通过产生多个假设来解决逆问题。但是,这些作品受到了强烈的监督,并且需要地面真相2D到3D对应关系,这很难获得。在本文中,我们提出了一个弱监督的深层生成网络,以解决反向问题并规避对地面真相2d到3D的需求。为此,我们设计了我们的网络来建模提案分布,用于近似未知的多模式目标后部分布。我们通过最大程度地减少提案和目标分布之间的KL差异来实现近似,这导致2D再投影误差和先前的损失术语,可能会受到弱监督。此外,我们使用平均移位算法确定最可能的解决方案为样品的条件模式。我们在三个基准数据集上评估了我们的方法-Human36M,MPII和MPI-INF-3DHP。实验结果表明,与现有的弱监督方法相比,我们的方法能够产生多个可行的假设并实现最先进的结果。我们的源代码可在项目网站上找到。
3D human pose estimation from a single image is an inverse problem due to the inherent ambiguity of the missing depth. Several previous works addressed the inverse problem by generating multiple hypotheses. However, these works are strongly supervised and require ground truth 2D-to-3D correspondences which can be difficult to obtain. In this paper, we propose a weakly supervised deep generative network to address the inverse problem and circumvent the need for ground truth 2D-to-3D correspondences. To this end, we design our network to model a proposal distribution which we use to approximate the unknown multi-modal target posterior distribution. We achieve the approximation by minimizing the KL divergence between the proposal and target distributions, and this leads to a 2D reprojection error and a prior loss term that can be weakly supervised. Furthermore, we determine the most probable solution as the conditional mode of the samples using the mean-shift algorithm. We evaluate our method on three benchmark datasets -- Human3.6M, MPII and MPI-INF-3DHP. Experimental results show that our approach is capable of generating multiple feasible hypotheses and achieves state-of-the-art results compared to existing weakly supervised approaches. Our source code is available at the project website.