深度学习推论的整数量化：原理和经验评估

论文标题

深度学习推论的整数量化：原理和经验评估

Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation

论文作者

Wu, Hao, Judd, Patrick, Zhang, Xiaojie, Isaev, Mikhail, Micikevicius, Paulius

论文摘要

量化技术可以通过利用高吞吐量整数指令来减少深神网络的大小，并改善推理潜伏期和吞吐量。在本文中，我们回顾了量化参数的数学方面，并在各种神经网络模型上评估了它们的选择，以用于不同的应用领域，包括视觉，语音和语言。我们专注于具有高通量整数数学管道的处理器可以加速加速的量化技术。我们还提出了一个用于8位量化的工作流程，该工作流能够在所有研究网络上的浮点基线的1％内保持精度，包括更难量化的模型，例如Mobilenets和Bert-Large。

Quantization techniques can reduce the size of Deep Neural Networks and improve inference latency and throughput by taking advantage of high throughput integer instructions. In this paper we review the mathematical aspects of quantization parameters and evaluate their choices on a wide range of neural network models for different application domains, including vision, speech, and language. We focus on quantization techniques that are amenable to acceleration by processors with high-throughput integer math pipelines. We also present a workflow for 8-bit quantization that is able to maintain accuracy within 1% of the floating-point baseline on all networks studied, including models that are more difficult to quantize, such as MobileNets and BERT-large.

下载PDF全文

下载文献需遵守相关版权规定

论文标题