论文标题

通过长期内存增强网络进行现场手势分类

In-Place Gestures Classification via Long-term Memory Augmented Network

论文作者

Zhao, Lizhi, Lu, Xuequan, Bao, Qianyue, Wang, Meili

论文摘要

基于手势的虚拟运动技术使用户能够控制自己的观点并在3D虚拟环境中直观地移动。一个关键的研究问题是准确,快速识别就地手势,因为它们可以触发虚拟观点的特定运动并增强用户体验。但是,为了获得实时体验,只能将短期传感器序列数据(约300ms,6至10帧)作为输入,这实际上会影响由于时空信息有限而导致的分类性能。在本文中,我们提出了一个新型的长期记忆增强网络,以用于现场手势分类。它以短期手势序列样本及其相应的长期序列样本为输入,它们在训练阶段提供了额外相关的时空信息。我们以外部内存队列存储长期序列特征。此外,我们设计了一个内存增强的损失,以帮助同一类的聚类特征,并将不同类别的特征推开,从而使我们的内存队列记住更相关的长期序列特征。在推论阶段,我们仅输入短期序列样本以相应地回忆存储的特征,并将它们融合在一起以预测手势类别。我们创建了一个有11个手势的参与者的大规模的场地手势数据集。我们的方法以192ms的潜伏期达到95.1%的有希望的准确性,准确性为97.3%,潜伏期为312毫秒,并且被证明优于最近的现场手势分类技术。用户研究还验证了我们的方法。我们的源代码和数据集将提供给社区。

In-place gesture-based virtual locomotion techniques enable users to control their viewpoint and intuitively move in the 3D virtual environment. A key research problem is to accurately and quickly recognize in-place gestures, since they can trigger specific movements of virtual viewpoints and enhance user experience. However, to achieve real-time experience, only short-term sensor sequence data (up to about 300ms, 6 to 10 frames) can be taken as input, which actually affects the classification performance due to limited spatio-temporal information. In this paper, we propose a novel long-term memory augmented network for in-place gestures classification. It takes as input both short-term gesture sequence samples and their corresponding long-term sequence samples that provide extra relevant spatio-temporal information in the training phase. We store long-term sequence features with an external memory queue. In addition, we design a memory augmented loss to help cluster features of the same class and push apart features from different classes, thus enabling our memory queue to memorize more relevant long-term sequence features. In the inference phase, we input only short-term sequence samples to recall the stored features accordingly, and fuse them together to predict the gesture class. We create a large-scale in-place gestures dataset from 25 participants with 11 gestures. Our method achieves a promising accuracy of 95.1% with a latency of 192ms, and an accuracy of 97.3% with a latency of 312ms, and is demonstrated to be superior to recent in-place gesture classification techniques. User study also validates our approach. Our source code and dataset will be made available to the community.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源