基于骨架的动作识别的时空构成图卷积网络

论文标题

基于骨架的动作识别的时空构成图卷积网络

Spatio-Temporal Inception Graph Convolutional Networks for Skeleton-Based Action Recognition

论文作者

Huang, Zhen, Shen, Xu, Tian, Xinmei, Li, Houqiang, Huang, Jianqiang, Hua, Xian-Sheng

论文摘要

基于骨架的人类动作识别引起了广泛的关注，可访问深度传感器的流行率。最近，由于其强大的图形数据功能，图形卷积网络（GCN）已被广泛用于此任务。邻接图的拓扑是建模输入骨架相关性的关键因素。因此，以前的方法主要集中于图形拓扑的设计/学习。但是，一旦学习了拓扑，仅在网络的每一层中都有一个单一尺度的功能和一个转换。尚未在GCN中研究许多证明在卷积神经网络（CNN）中非常有效的见解，例如多尺度信息和多组转换。原因是，由于图形结构骨骼数据和常规图像/视频数据之间的差距，将这些见解嵌入GCN非常具有挑战性。为了克服这一差距，我们在GCN中重新发明了用于骨架序列处理的分裂转换策略。具体而言，我们为基于骨架的动作识别设计了一个简单且高度模块化的图形卷积网络体系结构。我们的网络是通过重复一个构建块来构建的，该构建块从空间和时间路径中汇总了多晶型信息。广泛的实验表明，我们的网络的表现优于最先进的方法，只有1/5的参数和1/10的差距。代码可从https://github.com/yellowtownhz/stigcn获得。

Skeleton-based human action recognition has attracted much attention with the prevalence of accessible depth sensors. Recently, graph convolutional networks (GCNs) have been widely used for this task due to their powerful capability to model graph data. The topology of the adjacency graph is a key factor for modeling the correlations of the input skeletons. Thus, previous methods mainly focus on the design/learning of the graph topology. But once the topology is learned, only a single-scale feature and one transformation exist in each layer of the networks. Many insights, such as multi-scale information and multiple sets of transformations, that have been proven to be very effective in convolutional neural networks (CNNs), have not been investigated in GCNs. The reason is that, due to the gap between graph-structured skeleton data and conventional image/video data, it is very challenging to embed these insights into GCNs. To overcome this gap, we reinvent the split-transform-merge strategy in GCNs for skeleton sequence processing. Specifically, we design a simple and highly modularized graph convolutional network architecture for skeleton-based action recognition. Our network is constructed by repeating a building block that aggregates multi-granularity information from both the spatial and temporal paths. Extensive experiments demonstrate that our network outperforms state-of-the-art methods by a significant margin with only 1/5 of the parameters and 1/10 of the FLOPs. Code is available at https://github.com/yellowtownhz/STIGCN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题