无监督的域适应性学习，用于层次婴儿姿势识别与合成数据

论文标题

无监督的域适应性学习，用于层次婴儿姿势识别与合成数据

Unsupervised Domain Adaptation Learning for Hierarchical Infant Pose Recognition with Synthetic Data

论文作者

Yang, Cheng-Yen, Jiang, Zhongyu, Gu, Shih-Yu, Hwang, Jenq-Neng, Yoo, Jang-Hee

论文摘要

艾伯塔省婴儿运动量表（AIMS）是一种众所周知的评估方案，可通过记录所达到的特定姿势的数量来评估婴儿的总体运动发展。借助基于图像的姿势识别模型，可以缩短和自动化目标评估程序，从而提供早期诊断或潜在发育障碍的指标。由于与公共婴儿有关的数据集有限，许多作品使用基于SMIL的方法来生成合成婴儿图像进行培训。但是，实际和合成训练样本之间的域不匹配通常会导致推理期间的性能降解。在本文中，我们提出了一个基于CNN的模型，该模型将任何婴儿图像作为输入，并预测粗层和精细的姿势标签。该模型由图像分支和一个姿势分支组成，该分别通过使用Smplify优化的HRNET来促进了由无监督域适应和3D关键点促进的粗级别逻辑。然后，这些分支的输出将发送到分层姿势识别模块中，以估计细姿势标签。我们还收集和标记一个新的目标数据集，该数据集包含750个带有Aims姿势标签的真实和4000个合成婴儿图像。我们的实验结果表明，所提出的方法可以显着与合成和现实世界数据集的分布保持一致，从而在细颗粒的婴儿姿势识别上实现准确的性能。

The Alberta Infant Motor Scale (AIMS) is a well-known assessment scheme that evaluates the gross motor development of infants by recording the number of specific poses achieved. With the aid of the image-based pose recognition model, the AIMS evaluation procedure can be shortened and automated, providing early diagnosis or indicator of potential developmental disorder. Due to limited public infant-related datasets, many works use the SMIL-based method to generate synthetic infant images for training. However, this domain mismatch between real and synthetic training samples often leads to performance degradation during inference. In this paper, we present a CNN-based model which takes any infant image as input and predicts the coarse and fine-level pose labels. The model consists of an image branch and a pose branch, which respectively generates the coarse-level logits facilitated by the unsupervised domain adaptation and the 3D keypoints using the HRNet with SMPLify optimization. Then the outputs of these branches will be sent into the hierarchical pose recognition module to estimate the fine-level pose labels. We also collect and label a new AIMS dataset, which contains 750 real and 4000 synthetic infants images with AIMS pose labels. Our experimental results show that the proposed method can significantly align the distribution of synthetic and real-world datasets, thus achieving accurate performance on fine-grained infant pose recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题