论文标题
FXP-QNET:用于设计具有动态固定点表示的混合低精度DNN的训练后量化
FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs with Dynamic Fixed-Point Representation
论文作者
论文摘要
深度神经网络(DNN)在广泛的计算机视觉任务中证明了它们的有效性,并通过需要密集的计算和记忆的复杂和深层结构获得了最新的结果。如今,有效的模型推断对于资源约束平台上的消费者应用至关重要。结果,对专门的深度学习(DL)硬件的研究和开发非常感兴趣,以提高DNN的吞吐量和能源效率。通过量化对DNN数据结构的低精度表示将为专业DL硬件带来巨大好处。但是,严格的量化会导致严重的精度下降。因此,量化在比特精确水平上打开了大型的超参数空间,探索是一个主要的挑战。在本文中,我们提出了一个新型框架,称为深神经网络(FXP-QNET)的定点量化器,该量子柔性地设计了一个混合的低精度DNN,以用于整数偏心部署。具体而言,FXP-QNET逐渐根据网络准确性和低精度要求之间的权衡来逐渐适应每一层数据结构的量化水平。此外,它采用训练后的自我介绍和网络预测误差统计信息来优化浮点值的量化为固定点数。检查FXP-QNET在最先进的架构和基准Imagenet数据集上,我们从经验上证明了FXP-QNET在无需培训的情况下实现准确性压缩权衡的有效性。结果表明,FXP-QNet定量的Alexnet,VGG-16和RESNET-18将其全精确配对的总体内存要求降低了7.16倍,10.36倍和6.44倍,分别低于0.95%,0.95%和1.99%的准确度下降。
Deep neural networks (DNNs) have demonstrated their effectiveness in a wide range of computer vision tasks, with the state-of-the-art results obtained through complex and deep structures that require intensive computation and memory. Now-a-days, efficient model inference is crucial for consumer applications on resource-constrained platforms. As a result, there is much interest in the research and development of dedicated deep learning (DL) hardware to improve the throughput and energy efficiency of DNNs. Low-precision representation of DNN data-structures through quantization would bring great benefits to specialized DL hardware. However, the rigorous quantization leads to a severe accuracy drop. As such, quantization opens a large hyper-parameter space at bit-precision levels, the exploration of which is a major challenge. In this paper, we propose a novel framework referred to as the Fixed-Point Quantizer of deep neural Networks (FxP-QNet) that flexibly designs a mixed low-precision DNN for integer-arithmetic-only deployment. Specifically, the FxP-QNet gradually adapts the quantization level for each data-structure of each layer based on the trade-off between the network accuracy and the low-precision requirements. Additionally, it employs post-training self-distillation and network prediction error statistics to optimize the quantization of floating-point values into fixed-point numbers. Examining FxP-QNet on state-of-the-art architectures and the benchmark ImageNet dataset, we empirically demonstrate the effectiveness of FxP-QNet in achieving the accuracy-compression trade-off without the need for training. The results show that FxP-QNet-quantized AlexNet, VGG-16, and ResNet-18 reduce the overall memory requirements of their full-precision counterparts by 7.16x, 10.36x, and 6.44x with less than 0.95%, 0.95%, and 1.99% accuracy drop, respectively.