论文标题

Transrac:与变压器编码多尺度的时间相关性,以进行重复动作计数

TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

论文作者

Hu, Huazhang, Dong, Sixun, Zhao, Yiqun, Lian, Dongze, Li, Zhengxin, Gao, Shenghua

论文摘要

在人类活动(例如体育锻炼)中广泛看到重复行动。现有的方法着重于在简短视频中执行重复的动作计数,这对于在更现实的场景中处理更长的视频很难。在数据驱动的时代,这种概括能力的退化主要归因于缺乏长期视频数据集。为了补充这个边距,我们引入了一个新的大规模重复动作计数数据集,涵盖了各种视频长度,以及更现实的情况,在视频中会发生动作中断或动作不一致的情况。此外,我们还提供了对动作周期的细粒注释,而不仅仅是计数注释以及数值。这样的数据集包含1,451个带有约20,000个注释的视频,这更具挑战性。为了重复采取对更现实的场景的计算,我们进一步提出了与可以考虑性能和效率的变压器的多尺度时间相关性的编码。此外,在动作周期的细粒度注释的帮助下,我们提出了一种基于密度图回归的方法来预测动作周期,该方法可以通过足够的可解释性产生更好的性能。我们所提出的方法在所有数据集上都优于最先进的方法,并且在不进行微调的情况下,在看不见的数据集上的性能更好。数据集和代码可用。

Counting repetitive actions are widely seen in human activities such as physical exercise. Existing methods focus on performing repetitive action counting in short videos, which is tough for dealing with longer videos in more realistic scenarios. In the data-driven era, the degradation of such generalization capability is mainly attributed to the lack of long video datasets. To complement this margin, we introduce a new large-scale repetitive action counting dataset covering a wide variety of video lengths, along with more realistic situations where action interruption or action inconsistencies occur in the video. Besides, we also provide a fine-grained annotation of the action cycles instead of just counting annotation along with a numerical value. Such a dataset contains 1,451 videos with about 20,000 annotations, which is more challenging. For repetitive action counting towards more realistic scenarios, we further propose encoding multi-scale temporal correlation with transformers that can take into account both performance and efficiency. Furthermore, with the help of fine-grained annotation of action cycles, we propose a density map regression-based method to predict the action period, which yields better performance with sufficient interpretability. Our proposed method outperforms state-of-the-art methods on all datasets and also achieves better performance on the unseen dataset without fine-tuning. The dataset and code are available.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源