论文标题

用于手臂动态估计的时空平行变压器

Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation

论文作者

Liu, Shuying, Wu, Wenbin, Wu, Jiaxian, Lin, Yue

论文摘要

我们提出了一种通过使用手臂和手之间的关系来估算单眼视频的手臂和手动动态的方法。尽管近年来单眼全人运动捕获技术取得了长足的进步,但从野外视频中恢复了准确且合理的手臂曲折和手势仍然是一个挑战。为了解决这个问题,我们的解决方案是基于以下事实提出的:在大多数真实情况下,手臂姿势和手势高度相关。为了完全利用臂手的相关性以及框架间的信息,我们仔细设计了一个时空平行的手臂手动变压器(PAHMT),以同时预测手臂和手动动力学。我们还引入了新的损失,以鼓励估计平稳准确。此外,我们收集了一个运动捕获数据集,其中包括200K手势的手势框架,并使用此数据来训练我们的模型。通过整合2D手姿势估计模型和3D人姿势估计模型,该提出的方法可以从单眼视频中产生合理的手臂和手部动力学。广泛的评估表明,所提出的方法比以前的最新方法具有优势,并在各种具有挑战性的情况下表现出鲁棒性。

We propose an approach to estimate arm and hand dynamics from monocular video by utilizing the relationship between arm and hand. Although monocular full human motion capture technologies have made great progress in recent years, recovering accurate and plausible arm twists and hand gestures from in-the-wild videos still remains a challenge. To solve this problem, our solution is proposed based on the fact that arm poses and hand gestures are highly correlated in most real situations. To fully exploit arm-hand correlation as well as inter-frame information, we carefully design a Spatial-Temporal Parallel Arm-Hand Motion Transformer (PAHMT) to predict the arm and hand dynamics simultaneously. We also introduce new losses to encourage the estimations to be smooth and accurate. Besides, we collect a motion capture dataset including 200K frames of hand gestures and use this data to train our model. By integrating a 2D hand pose estimation model and a 3D human pose estimation model, the proposed method can produce plausible arm and hand dynamics from monocular video. Extensive evaluations demonstrate that the proposed method has advantages over previous state-of-the-art approaches and shows robustness under various challenging scenarios.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源