HDNET：多人相机空间本地化的人类深度估计

论文标题

HDNET：多人相机空间本地化的人类深度估计

HDNet: Human Depth Estimation for Multi-Person Camera-Space Localization

论文作者

Lin, Jiahao, Lee, Gim Hee

论文摘要

当前关于多人3D姿势估计的工作主要集中于3D关节位置相对于根关节的估计，而忽略了每个姿势的绝对位置。在本文中，我们提出了人类深度估计网络（HDNET），这是一个在相机坐标空间中绝对根关节定位的端到端框架。我们的HDNET首先估计了与关节热图的2D人姿势。这些估计的热图作为与目标人相对应的图像区域合并特征的注意力面具。基于骨架的图形神经网络（GNN）用于在关节之间传播特征。我们将目标深度回归作为bin指数估计问题提出，可以通过从HDNET的分类输出中进行软弧线操作来转换。我们使用两个基准数据集（即Human 36M和Mupots-3d）评估了根关节定位和根层构成3D姿势估计任务的HDNET。实验结果表明，在多个评估指标下，我们始终如一地超越先前的最新面貌。我们的源代码可在以下网址获得：https：//github.com/jiahaoljh/humandepth。

Current works on multi-person 3D pose estimation mainly focus on the estimation of the 3D joint locations relative to the root joint and ignore the absolute locations of each pose. In this paper, we propose the Human Depth Estimation Network (HDNet), an end-to-end framework for absolute root joint localization in the camera coordinate space. Our HDNet first estimates the 2D human pose with heatmaps of the joints. These estimated heatmaps serve as attention masks for pooling features from image regions corresponding to the target person. A skeleton-based Graph Neural Network (GNN) is utilized to propagate features among joints. We formulate the target depth regression as a bin index estimation problem, which can be transformed with a soft-argmax operation from the classification output of our HDNet. We evaluate our HDNet on the root joint localization and root-relative 3D pose estimation tasks with two benchmark datasets, i.e., Human3.6M and MuPoTS-3D. The experimental results show that we outperform the previous state-of-the-art consistently under multiple evaluation metrics. Our source code is available at: https://github.com/jiahaoLjh/HumanDepth.

下载PDF全文

下载文献需遵守相关版权规定

论文标题