AUSN：通过适应性地叠加不均匀分布的深神经网络，大约统一的量化

论文标题

AUSN：通过适应性地叠加不均匀分布的深神经网络，大约统一的量化

AUSN: Approximately Uniform Quantization by Adaptively Superimposing Non-uniform Distribution for Deep Neural Networks

论文作者

Fangxin, Liu, Wenbo, Zhao, Yanzhi, Wang, Changzhi, Dai, Li, Jiang

论文摘要

量化对于简化边缘应用中的DNN推断至关重要。 Existing uniform and non-uniform quantization methods, however, exhibit an inherent conflict between the representing range and representing resolution, and thereby result in either underutilized bit-width or significant accuracy drop. Moreover, these methods encounter three drawbacks: i) the absence of a quantitative metric for in-depth analysis of the source of the quantization errors; ii）基于CNN的图像分类任务的有限重点； iii）通过降低位宽度来降低的真实硬件和能源消耗的意识。 In this paper, we first define two quantitative metrics, i.e., the Clipping Error and rounding error, to analyze the quantization error distribution.我们观察到，边界和圆形误差在各个层，模型和任务之间差异很大。因此，我们提出了一种新颖的量化方法来量化重量和激活。关键思想是通过自适应超级超级量化多个非均匀量化值（即AUSN）来近似统一的量化。 AUSN is consist of a decoder-free coding scheme that efficiently exploits the bit-width to its extreme, a superposition quantization algorithm that can adapt the coding scheme to different DNN layers, models and tasks without extra hardware design effort, and a rounding scheme that can eliminate the well-known bit-width overflow and re-quantization issues. Theoretical analysis~(see Appendix A) and accuracy evaluation on various DNN models of different tasks show the effectiveness and generalization of AUSN. The synthesis~(see Appendix B) results on FPGA show $2\times$ reduction of the energy consumption, and $2\times$ to $4\times$ reduction of the hardware resource.

Quantization is essential to simplify DNN inference in edge applications. Existing uniform and non-uniform quantization methods, however, exhibit an inherent conflict between the representing range and representing resolution, and thereby result in either underutilized bit-width or significant accuracy drop. Moreover, these methods encounter three drawbacks: i) the absence of a quantitative metric for in-depth analysis of the source of the quantization errors; ii) the limited focus on the image classification tasks based on CNNs; iii) the unawareness of the real hardware and energy consumption reduced by lowering the bit-width. In this paper, we first define two quantitative metrics, i.e., the Clipping Error and rounding error, to analyze the quantization error distribution. We observe that the boundary- and rounding- errors vary significantly across layers, models and tasks. Consequently, we propose a novel quantization method to quantize the weight and activation. The key idea is to Approximate the Uniform quantization by Adaptively Superposing multiple Non-uniform quantized values, namely AUSN. AUSN is consist of a decoder-free coding scheme that efficiently exploits the bit-width to its extreme, a superposition quantization algorithm that can adapt the coding scheme to different DNN layers, models and tasks without extra hardware design effort, and a rounding scheme that can eliminate the well-known bit-width overflow and re-quantization issues. Theoretical analysis~(see Appendix A) and accuracy evaluation on various DNN models of different tasks show the effectiveness and generalization of AUSN. The synthesis~(see Appendix B) results on FPGA show $2\times$ reduction of the energy consumption, and $2\times$ to $4\times$ reduction of the hardware resource.

下载PDF全文

下载文献需遵守相关版权规定

论文标题