论文标题
SEQHAND:基于RGB序列的3D手姿势和形状估计
SeqHAND:RGB-Sequence-Based 3D Hand Pose and Shape Estimation
论文作者
论文摘要
长时间研究了基于RGB图像的3D手姿势估计。但是,大多数研究都基于独立的静态图像进行了逐帧估计。在本文中,我们不仅试图考虑一只手的外观,而且还将手中运动的时间运动信息纳入学习框架中,以获得更好的3D手姿势估计性能,这导致了具有带有顺序RGB手图像的大型数据集的必要性。我们提出了一种新型方法,该方法生成一个合成数据集,该数据集通过将现存的静态手动姿势数据集重新设计为姿势流程,模仿自然的人体运动。借助生成的数据集,我们训练了一个新提出的复发框架,从运动中的合成手的顺序图像中利用视觉速度特征,并强调具有时间一致性约束的估计的时间平滑度。我们的新型训练策略是从合成到真实的域填充过程中脱离框架的复发层,可以保留从顺序合成手图像中学到的视觉量变特征。随后依次估计的手姿势会产生自然而平滑的手势,从而导致更强大的估计。我们表明,使用时间信息进行3D手姿势估计,通过在手工姿势估计基准中超过最先进的方法来显着增强一般姿势估计。
3D hand pose estimation based on RGB images has been studied for a long time. Most of the studies, however, have performed frame-by-frame estimation based on independent static images. In this paper, we attempt to not only consider the appearance of a hand but incorporate the temporal movement information of a hand in motion into the learning framework for better 3D hand pose estimation performance, which leads to the necessity of a large scale dataset with sequential RGB hand images. We propose a novel method that generates a synthetic dataset that mimics natural human hand movements by re-engineering annotations of an extant static hand pose dataset into pose-flows. With the generated dataset, we train a newly proposed recurrent framework, exploiting visuo-temporal features from sequential images of synthetic hands in motion and emphasizing temporal smoothness of estimations with a temporal consistency constraint. Our novel training strategy of detaching the recurrent layer of the framework during domain finetuning from synthetic to real allows preservation of the visuo-temporal features learned from sequential synthetic hand images. Hand poses that are sequentially estimated consequently produce natural and smooth hand movements which lead to more robust estimations. We show that utilizing temporal information for 3D hand pose estimation significantly enhances general pose estimations by outperforming state-of-the-art methods in experiments on hand pose estimation benchmarks.