用于基于骨架的动作识别的空间颞变压器网络

论文标题

用于基于骨架的动作识别的空间颞变压器网络

Spatial Temporal Transformer Network for Skeleton-based Action Recognition

论文作者

Plizzari, Chiara, Cannici, Marco, Matteucci, Matteo

论文摘要

近年来，基于骨架的人类动作识别引起了极大的兴趣，因为骨骼数据已被证明对照明变化，身体尺度，动态摄像头视图和复杂背景是有力的。然而，3D骨骼基础的潜在信息的有效编码仍然是一个空旷的问题。在这项工作中，我们提出了一种新型的时空变压器网络（ST-TR），该网络使用变压器自我发项操作员在关节之间进行依赖。在我们的ST-TR模型中，空间自我发项模块（SSA）用于理解不同身体部位之间的框架内相互作用，以及时间自我发项模块（TSA）以模拟框架间相关性。这两个组合在一个两流网络中，该网络使用NTU-RGB+D 60和NTU-RGB+D 120上的相同输入数据胜过最先进的模型。

Skeleton-based human action recognition has achieved a great interest in recent years, as skeleton data has been demonstrated to be robust to illumination changes, body scales, dynamic camera views, and complex background. Nevertheless, an effective encoding of the latent information underlying the 3D skeleton is still an open problem. In this work, we propose a novel Spatial-Temporal Transformer network (ST-TR) which models dependencies between joints using the Transformer self-attention operator. In our ST-TR model, a Spatial Self-Attention module (SSA) is used to understand intra-frame interactions between different body parts, and a Temporal Self-Attention module (TSA) to model inter-frame correlations. The two are combined in a two-stream network which outperforms state-of-the-art models using the same input data on both NTU-RGB+D 60 and NTU-RGB+D 120.

下载PDF全文

下载文献需遵守相关版权规定

论文标题