注意训练后量化

论文标题

注意训练后量化

Attention Round for Post-Training Quantization

论文作者

Diao, Huabin, Li, Gongyan, Xu, Shaoyun, Hao, Yuexing

论文摘要

目前，神经网络模型的量化方法主要分为训练后量化（PTQ）和量化意识培训（QAT）。培训后量化仅需要一小部分数据即可完成量化过程，但是其量化模型的性能不如量化意识培训。本文提出了一种新颖的量化方法，称为注意弹。此方法给出了参数的机会，w有机会映射到所有可能的量化值，而不仅仅是在量化过程中w附近的两个量化值。被映射到不同量化值的概率与量化值和W之间的距离负相关，并随高斯函数衰减。此外，本文使用有损耗的编码长度作为衡量标准，将位宽度分配给模型的不同层，以解决混合精度量化的问题，这些问题有效地避免了解决组合优化问题。本文还对不同模型进行了定量实验，结果证实了该方法的有效性。对于RESNET18和MOBILENETV2，本文提出的后培训量化仅需要1,024个培训数据和10分钟即可完成量化过程，这可以在量化意识培训的情况下实现量化性能。

At present, the quantification methods of neural network models are mainly divided into post-training quantization (PTQ) and quantization aware training (QAT). Post-training quantization only need a small part of the data to complete the quantification process, but the performance of its quantitative model is not as good as the quantization aware training. This paper presents a novel quantification method called Attention Round. This method gives parameters w the opportunity to be mapped to all possible quantized values, rather than just the two quantized values nearby w in the process of quantization. The probability of being mapped to different quantified values is negatively correlated with the distance between the quantified values and w, and decay with a Gaussian function. In addition, this paper uses the lossy coding length as a measure to assign bit widths to the different layers of the model to solve the problem of mixed precision quantization, which effectively avoids to solve combinatorial optimization problem. This paper also performs quantitative experiments on different models, the results confirm the effectiveness of the proposed method. For ResNet18 and MobileNetV2, the post-training quantization proposed in this paper only require 1,024 training data and 10 minutes to complete the quantization process, which can achieve quantization performance on par with quantization aware training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题