论文标题
搜索多速率和多模式时间增强网络以识别手势
Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition
论文作者
论文摘要
由于其在应用中的巨大潜力,手势识别引起了人们的关注。尽管最近在多模式学习方法中取得了巨大进展,但现有方法仍然缺乏有效的整合,无法有效地探索时空模态之间的协同作用,以有效地识别手势。问题部分是由于现有手动设计的网络体系结构在多模式的联合学习中的效率较低。在本文中,我们提出了第一个基于RGB-D手势识别方法的神经体系结构搜索(NAS)方法。提出的方法包括两个关键组成部分:1)通过拟议的3D中央差卷积(3D-CDC)家族增强时间表示,它们能够通过汇总时间差信息来捕获丰富的时间上下文; 2)优化多样采样率分支的主机和各种方式之间的横向连接。最终的多模式多率网络提供了一个新的观点,以了解RGB和深度模式之间的关系及其时间动态。全面的实验在三个基准数据集(ISOGD,NVMENTURE和EGOGESTURE)上进行,展示了单个和多模式设置中最先进的性能。该代码可在https://github.com/zitong.com/zitongyu/zitongyu/3ddcdcdcdcdcdcdcdcdcdcdcdcdcdcdc-nas上获得。
Gesture recognition has attracted considerable attention owing to its great potential in applications. Although the great progress has been made recently in multi-modal learning methods, existing methods still lack effective integration to fully explore synergies among spatio-temporal modalities effectively for gesture recognition. The problems are partially due to the fact that the existing manually designed network architectures have low efficiency in the joint learning of multi-modalities. In this paper, we propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition. The proposed method includes two key components: 1) enhanced temporal representation via the proposed 3D Central Difference Convolution (3D-CDC) family, which is able to capture rich temporal context via aggregating temporal difference information; and 2) optimized backbones for multi-sampling-rate branches and lateral connections among varied modalities. The resultant multi-modal multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics. Comprehensive experiments are performed on three benchmark datasets (IsoGD, NvGesture, and EgoGesture), demonstrating the state-of-the-art performance in both single- and multi-modality settings.The code is available at https://github.com/ZitongYu/3DCDC-NAS