论文标题
otpose:遮挡感知变压器用于斑点标记的视频中的姿势估计
OTPose: Occlusion-Aware Transformer for Pose Estimation in Sparsely-Labeled Videos
论文作者
论文摘要
尽管视频中多人姿势估计的许多方法都显示出深刻的结果,但它们需要密集的注释数据,这需要过多的人劳动。此外,存在闭塞和运动模糊,这不可避免地导致估计性差。为了解决这些问题,我们提出了一种利用遮挡关节的注意力面罩的方法,并使用变压器在帧之间编码时间依赖。首先,我们的框架构成了稀疏注释的框架的不同组合,表示整个关节运动的轨道。我们从这些组合中提出了一个遮挡注意力面具,使编码闭塞感光热图作为半监督任务。其次,拟议的时间编码器采用变压器体系结构来有效地从每个时间步骤汇总了时间关系和关键点,并准确地完善了目标框架的最终姿势估计。我们实现了PoSetrack2017和PoSetrack2018数据集的最新姿势估计结果,并证明了我们在稀疏注释的视频数据中遮挡和运动模糊的鲁棒性。
Although many approaches for multi-human pose estimation in videos have shown profound results, they require densely annotated data which entails excessive man labor. Furthermore, there exists occlusion and motion blur that inevitably lead to poor estimation performance. To address these problems, we propose a method that leverages an attention mask for occluded joints and encodes temporal dependency between frames using transformers. First, our framework composes different combinations of sparsely annotated frames that denote the track of the overall joint movement. We propose an occlusion attention mask from these combinations that enable encoding occlusion-aware heatmaps as a semi-supervised task. Second, the proposed temporal encoder employs transformer architecture to effectively aggregate the temporal relationship and keypoint-wise attention from each time step and accurately refines the target frame's final pose estimation. We achieve state-of-the-art pose estimation results for PoseTrack2017 and PoseTrack2018 datasets and demonstrate the robustness of our approach to occlusion and motion blur in sparsely annotated video data.