论文标题
几次动作识别的层次组成表示
Hierarchical Compositional Representations for Few-shot Action Recognition
论文作者
论文摘要
最近,行动识别因其在智能监视和人为计算机互动中的全面和实用应用而受到了越来越多的关注。但是,由于数据稀缺性,很少有射击动作识别并没有充分探索,并且仍然具有挑战性。在本文中,我们提出了一种新型的分层组成表示(HCR)学习方法,以进行几次动作识别。具体而言,我们通过精心设计的层次聚类将复杂的作用分为几个子行动,并将子动作进一步分解为更细粒度的空间注意力集(SAS-Actions)。尽管基类和新颖类之间存在很大的差异,但它们可以在子行动或SAS-Actions中共享相似的模式。此外,我们在运输问题中采用了地球移动器的距离,以根据亚行动表示,衡量视频样本之间的相似性。它计算为距离度量的亚行动之间的最佳匹配流,这有利于比较细粒模式。广泛的实验表明,我们的方法在HMDB51,UCF101和动力学数据集上实现了最新结果。
Recently action recognition has received more and more attention for its comprehensive and practical applications in intelligent surveillance and human-computer interaction. However, few-shot action recognition has not been well explored and remains challenging because of data scarcity. In this paper, we propose a novel hierarchical compositional representations (HCR) learning approach for few-shot action recognition. Specifically, we divide a complicated action into several sub-actions by carefully designed hierarchical clustering and further decompose the sub-actions into more fine-grained spatially attentional sub-actions (SAS-actions). Although there exist large differences between base classes and novel classes, they can share similar patterns in sub-actions or SAS-actions. Furthermore, we adopt the Earth Mover's Distance in the transportation problem to measure the similarity between video samples in terms of sub-action representations. It computes the optimal matching flows between sub-actions as distance metric, which is favorable for comparing fine-grained patterns. Extensive experiments show our method achieves the state-of-the-art results on HMDB51, UCF101 and Kinetics datasets.