论文标题
Mobilenets的副本量化
Subtensor Quantization for Mobilenets
论文作者
论文摘要
深度神经网络(DNN)的量化已使开发人员能够部署具有更少内存和更有效的低功率推断的模型。但是,并非所有DNN设计都对量化友好。例如,对流行的Mobilenet架构进行了调整,以减少参数尺寸和计算潜伏期,并以可分离的深度卷积,但并非所有量化算法都可以很好地工作,并且准确性可能会违反其浮点版本。 In this paper, we analyzed several root causes of quantization loss and proposed alternatives that do not rely on per-channel or training-aware approaches.我们在Imagenet数据集上评估了图像分类任务,并在浮动点版本的0.7%以内量化了训练后的8位推断TOP-1精度。
Quantization for deep neural networks (DNN) have enabled developers to deploy models with less memory and more efficient low-power inference. However, not all DNN designs are friendly to quantization. For example, the popular Mobilenet architecture has been tuned to reduce parameter size and computational latency with separable depth-wise convolutions, but not all quantization algorithms work well and the accuracy can suffer against its float point versions. In this paper, we analyzed several root causes of quantization loss and proposed alternatives that do not rely on per-channel or training-aware approaches. We evaluate the image classification task on ImageNet dataset, and our post-training quantized 8-bit inference top-1 accuracy in within 0.7% of the floating point version.