HQNA：联合量化和体系结构搜索的自动CNN部署框架

论文标题

HQNA：联合量化和体系结构搜索的自动CNN部署框架

HQNAS: Auto CNN deployment framework for joint quantization and architecture search

论文作者

Chen, Hongjiang, Wang, Yang, Liu, Leibo, Wei, Shaojun, Yin, Shouyi

论文摘要

随着嵌入式计算系统的快速开发，深度学习应用程序正在从云转移到边缘。为了通过有限的资源预算实现更高的能源效率，必须以两个步骤（建筑设计和量化策略选择）精心设计神经网络（NNS）。当将NNS部署到嵌入式设备上时，已单独提出了神经体系结构搜索（NAS）和量化。但是，分别采取两个步骤是耗时的，并导致最佳的最终部署。为此，我们提出了一个新型的神经网络设计框架，称为硬件感知的量化神经体系结构搜索（HQNAS）框架，该框架将NAS和量化结合在一起，以非常有效的方式使用重量共享和位分别分配。在CIFAR10上发现出色的NN政策仅需4个小时即可。与传统的NAS方法相比，在Imagenet上生成可比的模型也仅需％10 GPU的时间，而潜伏期降低1.8倍，而准确的精度损失仅为0.7％。此外，我们的方法可以在终身情况下进行调整，在这种情况下，由于本地数据，环境和用户偏好的变化，神经网络需要偶尔发展。

Deep learning applications are being transferred from the cloud to edge with the rapid development of embedded computing systems. In order to achieve higher energy efficiency with the limited resource budget, neural networks(NNs) must be carefully designed in two steps, the architecture design and the quantization policy choice. Neural Architecture Search(NAS) and Quantization have been proposed separately when deploying NNs onto embedded devices. However, taking the two steps individually is time-consuming and leads to a sub-optimal final deployment. To this end, we propose a novel neural network design framework called Hardware-aware Quantized Neural Architecture Search(HQNAS) framework which combines the NAS and Quantization together in a very efficient manner using weight-sharing and bit-sharing. It takes only 4 GPU hours to discover an outstanding NN policy on CIFAR10. It also takes only %10 GPU time to generate a comparable model on Imagenet compared to the traditional NAS method with 1.8x decrease of latency and a negligible accuracy loss of only 0.7%. Besides, our method can be adapted in a lifelong situation where the neural network needs to evolve occasionally due to changes of local data, environment and user preference.

下载PDF全文

下载文献需遵守相关版权规定

论文标题