剩余框架具有有效的伪-3D CNN，用于人类行动识别

论文标题

剩余框架具有有效的伪-3D CNN，用于人类行动识别

Residual Frames with Efficient Pseudo-3D CNN for Human Action Recognition

论文作者

Chen, Jiawei, Hsiao, Jenson, Ho, Chiu Man

论文摘要

人类行动识别被认为是监视或视频理解等领域中的关键基石。尽管在开发基于视频的动作识别的端到端解决方案方面取得了进展，但实现最先进的性能仍然需要使用辅助手工制作的运动表示，例如光流，通常在计算上需要计算。在这项工作中，我们建议使用残留框架（即相邻RGB帧之间的差异）作为替代性的“轻量级”运动表示，该运动具有显着的运动信息，并且在计算上是有效的。此外，我们开发了一个新的伪3D卷积模块，该模块将3D卷积分解为2D和1D卷积。所提出的模块利用特征空间中的残留信息来更好地结构运动，并配备了一种自我发挥的机制，有助于重新校准外观和运动特征。经验结果证实了剩余框架以及拟议的伪3D卷积模块的效率和有效性。

Human action recognition is regarded as a key cornerstone in domains such as surveillance or video understanding. Despite recent progress in the development of end-to-end solutions for video-based action recognition, achieving state-of-the-art performance still requires using auxiliary hand-crafted motion representations, e.g., optical flow, which are usually computationally demanding. In this work, we propose to use residual frames (i.e., differences between adjacent RGB frames) as an alternative "lightweight" motion representation, which carries salient motion information and is computationally efficient. In addition, we develop a new pseudo-3D convolution module which decouples 3D convolution into 2D and 1D convolution. The proposed module exploits residual information in the feature space to better structure motions, and is equipped with a self-attention mechanism that assists to recalibrate the appearance and motion features. Empirical results confirm the efficiency and effectiveness of residual frames as well as the proposed pseudo-3D convolution module.

下载PDF全文

下载文献需遵守相关版权规定

论文标题