论文标题

LG-LSQ:学习的梯度线性对称量化

LG-LSQ: Learned Gradient Linear Symmetric Quantization

论文作者

Lin, Shih-Ting, Li, Zhaofang, Cheng, Yu-Hsiang, Kuo, Hao-Wen, Lu, Chih-Cheng, Tang, Kea-Tiong

论文摘要

在推理时间,具有较低精度权重和操作的深度神经网络在记忆空间和加速器功率的成本方面具有优势。与量化算法相关的主要挑战是在低位宽度下保持准确性。我们提出了学到的梯度线性对称量化(LG-LSQ),作为将权重和激活函数定量到低位宽度的一种方法,其整数神经网络处理器的精度很高。首先,我们介绍了缩放模拟梯度(SSG)方法,用于确定训练过程中线性量化器的缩放系数的适当梯度。其次,我们介绍了Arctangent软弹(ASR)方法,该方法与直通估计量(Ste)方法的不同之处在于防止梯度变为零的能力,从而解决了由圆形过程引起的离散问题。最后,为了弥合完整精确和低位量化网络之间的差距,我们提出了最小化离散误差(MDE)方法,以确定反向传播中的准确梯度。 ASR+MDE方法是Ste方法的简单替代方法,可用于不同统一量化方法。在我们的评估中,拟议的量化器在包括RESNET18,RESNET34和RESNET50在内的各种3位网络中实现了全精度的基线精度,并且在4位重量和4位激活的轻量级模型中,MobilenEtV2和ShuffLenEtV2的4位重量和4位激活的精度下降少于1%。

Deep neural networks with lower precision weights and operations at inference time have advantages in terms of the cost of memory space and accelerator power. The main challenge associated with the quantization algorithm is maintaining accuracy at low bit-widths. We propose learned gradient linear symmetric quantization (LG-LSQ) as a method for quantizing weights and activation functions to low bit-widths with high accuracy in integer neural network processors. First, we introduce the scaling simulated gradient (SSG) method for determining the appropriate gradient for the scaling factor of the linear quantizer during the training process. Second, we introduce the arctangent soft round (ASR) method, which differs from the straight-through estimator (STE) method in its ability to prevent the gradient from becoming zero, thereby solving the discrete problem caused by the rounding process. Finally, to bridge the gap between full-precision and low-bit quantization networks, we propose the minimize discretization error (MDE) method to determine an accurate gradient in backpropagation. The ASR+MDE method is a simple alternative to the STE method and is practical for use in different uniform quantization methods. In our evaluation, the proposed quantizer achieved full-precision baseline accuracy in various 3-bit networks, including ResNet18, ResNet34, and ResNet50, and an accuracy drop of less than 1% in the quantization of 4-bit weights and 4-bit activations in lightweight models such as MobileNetV2 and ShuffleNetV2.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源