Mobilenets的副本量化

论文标题

Mobilenets的副本量化

Subtensor Quantization for Mobilenets

论文作者

Dinh, Thu, Melnikov, Andrey, Daskalopoulos, Vasilios, Chai, Sek

论文摘要

深度神经网络（DNN）的量化已使开发人员能够部署具有更少内存和更有效的低功率推断的模型。但是，并非所有DNN设计都对量化友好。例如，对流行的Mobilenet架构进行了调整，以减少参数尺寸和计算潜伏期，并以可分离的深度卷积，但并非所有量化算法都可以很好地工作，并且准确性可能会违反其浮点版本。 In this paper, we analyzed several root causes of quantization loss and proposed alternatives that do not rely on per-channel or training-aware approaches.我们在Imagenet数据集上评估了图像分类任务，并在浮动点版本的0.7％以内量化了训练后的8位推断TOP-1精度。

Quantization for deep neural networks (DNN) have enabled developers to deploy models with less memory and more efficient low-power inference. However, not all DNN designs are friendly to quantization. For example, the popular Mobilenet architecture has been tuned to reduce parameter size and computational latency with separable depth-wise convolutions, but not all quantization algorithms work well and the accuracy can suffer against its float point versions. In this paper, we analyzed several root causes of quantization loss and proposed alternatives that do not rely on per-channel or training-aware approaches. We evaluate the image classification task on ImageNet dataset, and our post-training quantized 8-bit inference top-1 accuracy in within 0.7% of the floating point version.

下载PDF全文

下载文献需遵守相关版权规定

论文标题