ANT：用于低位深神经网络量化的自适应数值数据类型

论文标题

ANT：用于低位深神经网络量化的自适应数值数据类型

ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization

论文作者

Guo, Cong, Zhang, Chen, Leng, Jingwen, Liu, Zihan, Yang, Fan, Liu, Yunxin, Guo, Minyi, Zhu, Yuhao

论文摘要

量化是一种降低DNN模型的计算和记忆成本的技术，DNN模型越来越大。现有的量化解决方案使用固定点整数或浮点类类型，这些量子的好处有限，因为两者都需要更多的位以保持原始模型的准确性。另一方面，可变长度量化对正常值使用低位量化，而远距离量化的异常值则使用了高精度。即使这项工作带来了算法的好处，但由于长度的编码和解码，它也引入了重要的硬件开销。在这项工作中，我们提出了一种称为ANT的固定长度自适应数值数据类型，以通过微小的硬件开销实现低位量化。我们的数据类型ANT利用了两个关键的创新来利用DNN模型中的张贴内和调整式自适应机会。首先，我们提出了一种特定的数据类型Flint，该数据类型结合了Float的优势和INT的优势，以适应张量中不同值的重要性。其次，我们提出了一个自适应框架，该框架根据其分布特性选择每个张量的最佳类型。我们为蚂蚁设计了一个统一的处理元件体系结构，并显示了与现有DNN加速器的易于集成。我们的设计导致2.8 $ \ times $速度和2.5 $ \ times $ $ $ $ $ \ $ \ times $ $ \ times $ $ \ times $ $ \ times $ $ \ times $比最先进的量化加速器提高了能源效率。

Quantization is a technique to reduce the computation and memory cost of DNN models, which are getting increasingly large. Existing quantization solutions use fixed-point integer or floating-point types, which have limited benefits, as both require more bits to maintain the accuracy of original models. On the other hand, variable-length quantization uses low-bit quantization for normal values and high-precision for a fraction of outlier values. Even though this line of work brings algorithmic benefits, it also introduces significant hardware overheads due to variable-length encoding and decoding. In this work, we propose a fixed-length adaptive numerical data type called ANT to achieve low-bit quantization with tiny hardware overheads. Our data type ANT leverages two key innovations to exploit the intra-tensor and inter-tensor adaptive opportunities in DNN models. First, we propose a particular data type, flint, that combines the advantages of float and int for adapting to the importance of different values within a tensor. Second, we propose an adaptive framework that selects the best type for each tensor according to its distribution characteristics. We design a unified processing element architecture for ANT and show its ease of integration with existing DNN accelerators. Our design results in 2.8$\times$ speedup and 2.5$\times$ energy efficiency improvement over the state-of-the-art quantization accelerators.

下载PDF全文

下载文献需遵守相关版权规定

论文标题