RT3D：在移动设备上实现3D卷积神经网络的实时执行

论文标题

RT3D：在移动设备上实现3D卷积神经网络的实时执行

RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices

论文作者

Niu, Wei, Sun, Mengshu, Li, Zhengang, Chen, Jou-An, Guan, Jiexiong, Shen, Xipeng, Wang, Yanzhi, Liu, Sijia, Lin, Xue, Ren, Bin

论文摘要

移动设备正在成为深度学习任务的重要运营商，因为它们配备了强大的高端移动CPU和GPU。但是，除了高推理精度外，执行针对实时性能的3D卷积神经网络（CNN）仍然是一项具有挑战性的任务。原因是更复杂的模型结构和更高的模型维度淹没了移动设备上可用的计算/存储资源。自然的方式可能是转向深度学习的修剪技术。但是，现有的2D CNN重量修剪方法直接概括为3D CNN并不是完全利用移动并行性同时实现高推理准确性的理想选择。本文提出了RT3D，这是一种用于3D CNN的模型压缩和移动加速框架，无缝整合神经网络重量修剪和编译器代码生成技术。我们提出并研究了两种结构化的稀疏方案，即，香草结构的稀疏性和内核组结构（kgs）的稀疏性是移动加速度友好的。香草的稀疏度消除了整个内核基团，而KGS的稀疏性是一种更细粒度的结构性稀疏性，在利用完整的智障并行性的同时，具有更高的灵活性。我们提出一种重新加权的正规化修剪算法，以实现拟议的稀疏方案。由于稀疏性引起的推理时间速度正在接近整个模型拖曳的修剪速率（浮点操作）。 RT3D与当前支持3D CNN的移动框架相比，在端到端推理时间内最多显示29.1 $ \ times $速度，中度为1％-1.5％的精度损失。 16个视频帧的端到端推理时间可能在150毫秒内，当时在手机上执行代表性的C3D和R（2+1）D模型。第一次，在现成的手机上实现了3D CNN的实时执行。

Mobile devices are becoming an important carrier for deep learning tasks, as they are being equipped with powerful, high-end mobile CPUs and GPUs. However, it is still a challenging task to execute 3D Convolutional Neural Networks (CNNs) targeting for real-time performance, besides high inference accuracy. The reason is more complex model structure and higher model dimensionality overwhelm the available computation/storage resources on mobile devices. A natural way may be turning to deep learning weight pruning techniques. However, the direct generalization of existing 2D CNN weight pruning methods to 3D CNNs is not ideal for fully exploiting mobile parallelism while achieving high inference accuracy. This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs, seamlessly integrating neural network weight pruning and compiler code generation techniques. We propose and investigate two structured sparsity schemes i.e., the vanilla structured sparsity and kernel group structured (KGS) sparsity that are mobile acceleration friendly. The vanilla sparsity removes whole kernel groups, while KGS sparsity is a more fine-grained structured sparsity that enjoys higher flexibility while exploiting full on-device parallelism. We propose a reweighted regularization pruning algorithm to achieve the proposed sparsity schemes. The inference time speedup due to sparsity is approaching the pruning rate of the whole model FLOPs (floating point operations). RT3D demonstrates up to 29.1$\times$ speedup in end-to-end inference time comparing with current mobile frameworks supporting 3D CNNs, with moderate 1%-1.5% accuracy loss. The end-to-end inference time for 16 video frames could be within 150 ms, when executing representative C3D and R(2+1)D models on a cellphone. For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.

下载PDF全文

下载文献需遵守相关版权规定

论文标题