硬件感知DNN的生成设计

论文标题

硬件感知DNN的生成设计

Generative Design of Hardware-aware DNNs

论文作者

Kao, Sheng-Chun, Ramamurthy, Arun, Krishna, Tushar

论文摘要

为了有效地在边缘/云上运行DNN，经常设计和部署许多新的DNN推理加速器。为了提高DNN的资源效率，模型量化是一种广泛使用的方法。但是，不同的Accelerator/HW具有不同的资源，导致需要每个HW的专门量化策略。此外，使用相同的每一层量化可以是最佳选择，从而增加了可能的量化选择的设计空间。这使手动调整不可行。在自动确定每一层量化的最新工作是由优化方法（例如增强学习）驱动的。但是，这些方法需要为每个新的HW平台重新训练RL。我们为自动量化和HW-Awaine Tuning提出了一种新的方式。我们提出了一个生成模型AQGAN，该模型将目标准确性作为条件，并生成一套量化配置。使用有条件的生成模型，用户可以自主生成不同目标在推理时间内具有不同目标的配置。此外，我们提出了一个简化的HW调整流，该流程使用生成模型来生成建议并基于HW Resource预算的简单选择，该预算快速且交互式。我们在ImageNet数据集上的五个广泛使用的有效模型上评估了我们的模型。我们与现有的统一量化和最新的自主量化方法进行了比较。我们的生成模型显示出具有竞争力的准确性，但是，每个设计点的搜索成本降低了约两个度。我们的生成模型表明，生成的量化配置可能导致所有实验的误差小于3.5％。

To efficiently run DNNs on the edge/cloud, many new DNN inference accelerators are being designed and deployed frequently. To enhance the resource efficiency of DNNs, model quantization is a widely-used approach. However, different accelerator/HW has different resources leading to the need for specialized quantization strategy of each HW. Moreover, using the same quantization for every layer may be sub-optimal, increasing the designspace of possible quantization choices. This makes manual-tuning infeasible. Recent work in automatically determining quantization for each layer is driven by optimization methods such as reinforcement learning. However, these approaches need re-training the RL for every new HW platform. We propose a new way for autonomous quantization and HW-aware tuning. We propose a generative model, AQGAN, which takes a target accuracy as the condition and generates a suite of quantization configurations. With the conditional generative model, the user can autonomously generate different configurations with different targets in inference time. Moreover, we propose a simplified HW-tuning flow, which uses the generative model to generate proposals and execute simple selection based on the HW resource budget, whose process is fast and interactive. We evaluate our model on five of the widely-used efficient models on the ImageNet dataset. We compare with existing uniform quantization and state-of-the-art autonomous quantization methods. Our generative model shows competitive achieved accuracy, however, with around two degrees less search cost for each design point. Our generative model shows the generated quantization configuration can lead to less than 3.5% error across all experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题