论文标题
为了使动态卷积神经网络推断Edge Intelligence
Towards Enabling Dynamic Convolution Neural Network Inference for Edge Intelligence
论文作者
论文摘要
深度学习应用程序在众多现实世界中取得了巨大的成功。深度学习模型,尤其是卷积神经网络(CNN)通常是使用FPGA原型原型的,因为它具有高功率效率和重新配置。 CNN在FPGA上的部署遵循一个设计周期,该设计周期需要在高级合成期间保存片上内存中的模型参数。边缘智能的最新进展需要CNN推断边缘网络以增加吞吐量并减少延迟。为了提供灵活性,需要向不同移动设备的动态参数分配才能实现预定义的或定义的即时CNN体系结构。在这项研究中,我们介绍了用于在运行时动态流动模型参数以实现传统CNN体系结构的新方法。我们进一步提出了一种基于图书馆的方法,以设计可扩展的动态分布式CNN推断,该方法对部分进行部分重新配置技术,该方法特别适用于资源受限的边缘设备。所提出的技术是在Xilinx Pynq-Z2板上实施的,以通过使用LENET-5 CNN模型来证明该概念。结果表明,所提出的方法有效,分类精度分别为92%,86%和94%
Deep learning applications have achieved great success in numerous real-world applications. Deep learning models, especially Convolution Neural Networks (CNN) are often prototyped using FPGA because it offers high power efficiency and reconfigurability. The deployment of CNNs on FPGAs follows a design cycle that requires saving of model parameters in the on-chip memory during High-level synthesis (HLS). Recent advances in edge intelligence require CNN inference on edge network to increase throughput and reduce latency. To provide flexibility, dynamic parameter allocation to different mobile devices is required to implement either a predefined or defined on-the-fly CNN architecture. In this study, we present novel methodologies for dynamically streaming the model parameters at run-time to implement a traditional CNN architecture. We further propose a library-based approach to design scalable and dynamic distributed CNN inference on the fly leveraging partial-reconfiguration techniques, which is particularly suitable for resource-constrained edge devices. The proposed techniques are implemented on the Xilinx PYNQ-Z2 board to prove the concept by utilizing the LeNet-5 CNN model. The results show that the proposed methodologies are effective, with classification accuracy rates of 92%, 86%, and 94% respectively