论文标题
对图像分类中深神经网络的模型量化的全面调查
A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification
论文作者
论文摘要
深度神经网络(DNN)在机器学习方面取得的最新进展非常重要。在证明高精度的同时,DNN与大量参数和计算相关联,这会导致高内存使用和能源消耗。结果,在具有约束硬件资源的设备上部署DNN会带来重大挑战。为了克服这一点,已广泛采用各种压缩技术来优化DNN加速器。一种有希望的方法是量化,其中全精度值以低位宽度精度存储。量化不仅减少了内存需求,还可以用低成本替代高成本操作。 DNN量化提供了硬件设计的灵活性和效率,使其在各种方法中都是广泛采用的技术。由于量化已在以前的工作中广泛使用,因此需要进行集成报告,以提供对不同量化方法的理解,分析和比较。因此,我们对量化概念和方法进行了全面的调查,重点是图像分类。我们描述了基于聚类的量化方法,并探索使用比例因子参数来近似全精度值。此外,我们彻底回顾了量化DNN的训练,包括使用直通估计器和量化正则化。我们解释了在量化的DNN中使用低成本手术的浮点操作的替换,并且在量化中的不同层的灵敏度。此外,我们强调了图像分类任务中量化方法和重要基准的评估指标。我们还介绍了CIFAR-10和Imagenet上最新方法的准确性。
Recent advancements in machine learning achieved by Deep Neural Networks (DNNs) have been significant. While demonstrating high accuracy, DNNs are associated with a huge number of parameters and computations, which leads to high memory usage and energy consumption. As a result, deploying DNNs on devices with constrained hardware resources poses significant challenges. To overcome this, various compression techniques have been widely employed to optimize DNN accelerators. A promising approach is quantization, in which the full-precision values are stored in low bit-width precision. Quantization not only reduces memory requirements but also replaces high-cost operations with low-cost ones. DNN quantization offers flexibility and efficiency in hardware design, making it a widely adopted technique in various methods. Since quantization has been extensively utilized in previous works, there is a need for an integrated report that provides an understanding, analysis, and comparison of different quantization approaches. Consequently, we present a comprehensive survey of quantization concepts and methods, with a focus on image classification. We describe clustering-based quantization methods and explore the use of a scale factor parameter for approximating full-precision values. Moreover, we thoroughly review the training of a quantized DNN, including the use of a straight-through estimator and quantization regularization. We explain the replacement of floating-point operations with low-cost bitwise operations in a quantized DNN and the sensitivity of different layers in quantization. Furthermore, we highlight the evaluation metrics for quantization methods and important benchmarks in the image classification task. We also present the accuracy of the state-of-the-art methods on CIFAR-10 and ImageNet.