论文标题

模式补偿网络:跨模式适应行动识别

Modality Compensation Network: Cross-Modal Adaptation for Action Recognition

论文作者

Song, Sijie, Liu, Jiaying, Li, Yanghao, Guo, Zongming

论文摘要

随着RGB-D摄像机的流行,多模式视频数据已更加可用于人类行动识别。这项任务的一个主要挑战在于如何有效利用其互补信息。在这项工作中,我们提出了一个模式补偿网络(MCN),以探索不同方式的关系,并提高人类行动识别的表示。我们将RGB/光流视频视为源模式,骨骼是辅助模式。我们的目标是借助辅助模式,从源方式中提取更多的判别特征。我们的模型建立在深度卷积神经网络(CNN)和长期短期记忆(LSTM)网络的基础上,通过模态适应性块从源和辅助模式中桥接数据,以实现自适应表示学习,该网络学会在测试时间甚至在训练时间及时补偿骨骼的损失。我们探索多种适应方案,以根据培训中的来源和辅助数据的比对来缩小不同级别的源和辅助模态分布之间的距离。此外,仅在训练阶段需要骨骼。测试时,我们的模型能够通过源数据提高识别性能。实验结果表明,在四个广泛使用的动作识别基准上,MCN胜过最先进的方法。

With the prevalence of RGB-D cameras, multi-modal video data have become more available for human action recognition. One main challenge for this task lies in how to effectively leverage their complementary information. In this work, we propose a Modality Compensation Network (MCN) to explore the relationships of different modalities, and boost the representations for human action recognition. We regard RGB/optical flow videos as source modalities, skeletons as auxiliary modality. Our goal is to extract more discriminative features from source modalities, with the help of auxiliary modality. Built on deep Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) networks, our model bridges data from source and auxiliary modalities by a modality adaptation block to achieve adaptive representation learning, that the network learns to compensate for the loss of skeletons at test time and even at training time. We explore multiple adaptation schemes to narrow the distance between source and auxiliary modal distributions from different levels, according to the alignment of source and auxiliary data in training. In addition, skeletons are only required in the training phase. Our model is able to improve the recognition performance with source data when testing. Experimental results reveal that MCN outperforms state-of-the-art approaches on four widely-used action recognition benchmarks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源