论文标题
用于动态手势识别的短期时间卷积网络
Short-Term Temporal Convolutional Networks for Dynamic Hand Gesture Recognition
论文作者
论文摘要
手势识别的目的是识别人体的有意义运动,而手势识别是计算机视觉中的重要问题。在本文中,我们提出了一种基于3D密集卷积网络(3D-Densenets)和改进的时间卷积网络(TCN)的多模式识别方法。我们方法的关键思想是找到空间和时间特征的紧凑而有效的表示,该特征有序,分别将手势视频分析的任务分为两个部分:空间分析和时间分析。在空间分析中,我们采用3D-densenets来有效地学习短期时空特征。随后,在时间分析中,我们使用TCN来提取时间特征,并采用改进的挤压和兴奋网络(SENETS)来增强每个TCN层中时间特征的代表性。该方法已在VIVA和NVIDIA手势动态手势数据集上进行了评估。我们的方法在VIVA基准测试中获得了非常有竞争力的性能,分类精度为91.54%,并以86.37%的精度在NVIDIA基准测试中实现了最先进的性能。
The purpose of gesture recognition is to recognize meaningful movements of human bodies, and gesture recognition is an important issue in computer vision. In this paper, we present a multimodal gesture recognition method based on 3D densely convolutional networks (3D-DenseNets) and improved temporal convolutional networks (TCNs). The key idea of our approach is to find a compact and effective representation of spatial and temporal features, which orderly and separately divide task of gesture video analysis into two parts: spatial analysis and temporal analysis. In spatial analysis, we adopt 3D-DenseNets to learn short-term spatio-temporal features effectively. Subsequently, in temporal analysis, we use TCNs to extract temporal features and employ improved Squeeze-and-Excitation Networks (SENets) to strengthen the representational power of temporal features from each TCNs' layers. The method has been evaluated on the VIVA and the NVIDIA Gesture Dynamic Hand Gesture Datasets. Our approach obtains very competitive performance on VIVA benchmarks with the classification accuracies of 91.54%, and achieve state-of-the art performance with 86.37% accuracy on NVIDIA benchmark.