PALQUANT：在低精度加速器上加速高精度网络

论文标题

PALQUANT：在低精度加速器上加速高精度网络

PalQuant: Accelerating High-precision Networks on Low-precision Accelerators

论文作者

Hu, Qinghao, Li, Gang, Wu, Qiman, Cheng, Jian

论文摘要

最近，低精确的深度学习加速器（DLA）由于其在芯片区域和能源消耗方面的优势而变得流行，但是这些DLA上的低精确量化模型导致严重的准确性降低。达到高精度和高效推断的一种方法是在低精度DLA上部署高精度神经网络，这很少被研究。在本文中，我们提出了平行的低精确量化（PALQUANT）方法，该方法通过从头开始学习并行低精度表示来近似高精度计算。此外，我们提出了一个新型的循环洗牌模块，以增强平行低精度组之间的跨组信息通信。广泛的实验表明，PALQUANT具有准确性和推理速度的最先进量化方法，例如，对于RESNET-18网络量化，PALQUANT可以同时获得0.52 \％的准确性和1.78 $ \ times $ $ speedup，而同时在其4位的总体柜台上，在其4位的柜台上，这是一个州立大学的2-1级加速器。代码可在\ url {https://github.com/huqinghao/palquant}上找到。

Recently low-precision deep learning accelerators (DLAs) have become popular due to their advantages in chip area and energy consumption, yet the low-precision quantized models on these DLAs bring in severe accuracy degradation. One way to achieve both high accuracy and efficient inference is to deploy high-precision neural networks on low-precision DLAs, which is rarely studied. In this paper, we propose the PArallel Low-precision Quantization (PalQuant) method that approximates high-precision computations via learning parallel low-precision representations from scratch. In addition, we present a novel cyclic shuffle module to boost the cross-group information communication between parallel low-precision groups. Extensive experiments demonstrate that PalQuant has superior performance to state-of-the-art quantization methods in both accuracy and inference speed, e.g., for ResNet-18 network quantization, PalQuant can obtain 0.52\% higher accuracy and 1.78$\times$ speedup simultaneously over their 4-bit counter-part on a state-of-the-art 2-bit accelerator. Code is available at \url{https://github.com/huqinghao/PalQuant}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题