论文标题
悬崖:将完整框架的位置信息带入人体姿势和形状估计
CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation
论文作者
论文摘要
自上而下的方法主导了3D人类姿势和形状估计的领域,因为它们与人类检测脱钩,并使研究人员可以专注于核心问题。但是,裁剪是他们的第一步,从一开始就丢弃了位置信息,这使自己无法准确预测原始相机坐标系中的全局旋转。为了解决此问题,我们建议将完整框架(悬崖)的位置信息携带到此任务中。具体而言,我们通过将裁剪图像功能与其边界盒信息连接在一起来喂食更多的整体功能来悬崖。我们通过更广泛的全帧视图计算2D再投影损失,进行了类似于图像中投射的人的投影过程。克里夫(Cliff)通过全球态度感知信息进行了喂养和监督,直接预测了全球旋转以及更准确的明确姿势。此外,我们提出了一个基于Cliff的伪基真实注释,该注释为野外2D数据集提供了高质量的3D注释,并为基于回归的方法提供了重要的全面监督。对流行基准测试的广泛实验表明,悬崖的表现高于先前的艺术,并在Agora排行榜上获得了第一名(SMPL-Algorithms Track)。代码和数据可在https://github.com/huawei-noah/noah-research/tree/master/master/cliff中找到。
Top-down methods dominate the field of 3D human pose and shape estimation, because they are decoupled from human detection and allow researchers to focus on the core problem. However, cropping, their first step, discards the location information from the very beginning, which makes themselves unable to accurately predict the global rotation in the original camera coordinate system. To address this problem, we propose to Carry Location Information in Full Frames (CLIFF) into this task. Specifically, we feed more holistic features to CLIFF by concatenating the cropped-image feature with its bounding box information. We calculate the 2D reprojection loss with a broader view of the full frame, taking a projection process similar to that of the person projected in the image. Fed and supervised by global-location-aware information, CLIFF directly predicts the global rotation along with more accurate articulated poses. Besides, we propose a pseudo-ground-truth annotator based on CLIFF, which provides high-quality 3D annotations for in-the-wild 2D datasets and offers crucial full supervision for regression-based methods. Extensive experiments on popular benchmarks show that CLIFF outperforms prior arts by a significant margin, and reaches the first place on the AGORA leaderboard (the SMPL-Algorithms track). The code and data are available at https://github.com/huawei-noah/noah-research/tree/master/CLIFF.