论文标题
用无镜头相机拍摄的视频中的手势识别
Hand Gestures Recognition in Videos Taken with Lensless Camera
论文作者
论文摘要
无镜头相机是一种成像系统,它使用面具代替镜头,使其比镜头相机更薄,更轻且便宜。但是,图像重建需要其他复杂的计算和时间。这项工作提出了一个名为RAW3DNET的深度学习模型,该模型直接在无透镜相机捕获的原始视频上识别手势而无需进行图像恢复。除了保存计算资源外,无重构方法还提供了隐私保护。 RAW3DNET是一种新型的端到端深度神经网络模型,用于识别无镜头成像系统中的手势。它是专门为由无镜头相机捕获的原始视频而创建的,并且具有正确提取和结合时间和空间特征的能力。该网络由两个阶段组成:1。空间特征提取器(SFE),它在时间卷积之前增强了每个帧的空间特征; 2。3d-resnet,实现视频流的空间和时间卷积。所提出的模型在无透镜光学实验中的剑桥手势数据集上达到了98.59%的精度,这与镜头相机的结果相当。此外,评估了物理对象识别的可行性。此外,我们表明只有一小部分原始原始数据可以以可观的精度来实现识别,这表明在云计算方案中有可能减少数据流量的潜力。
A lensless camera is an imaging system that uses a mask in place of a lens, making it thinner, lighter, and less expensive than a lensed camera. However, additional complex computation and time are required for image reconstruction. This work proposes a deep learning model named Raw3dNet that recognizes hand gestures directly on raw videos captured by a lensless camera without the need for image restoration. In addition to conserving computational resources, the reconstruction-free method provides privacy protection. Raw3dNet is a novel end-to-end deep neural network model for the recognition of hand gestures in lensless imaging systems. It is created specifically for raw video captured by a lensless camera and has the ability to properly extract and combine temporal and spatial features. The network is composed of two stages: 1. spatial feature extractor (SFE), which enhances the spatial features of each frame prior to temporal convolution; 2. 3D-ResNet, which implements spatial and temporal convolution of video streams. The proposed model achieves 98.59% accuracy on the Cambridge Hand Gesture dataset in the lensless optical experiment, which is comparable to the lensed-camera result. Additionally, the feasibility of physical object recognition is assessed. Furtherly, we show that the recognition can be achieved with respectable accuracy using only a tiny portion of the original raw data, indicating the potential for reducing data traffic in cloud computing scenarios.