论文标题
训练重量和激活量化网络的最佳恢复
Recurrence of Optimum for Training Weight and Activation Quantized Networks
论文作者
论文摘要
对深神经网络(DNN)进行量化,以有效推断资源受限的平台。但是,具有低精确权重和激活的培训深度学习模型涉及一项苛刻的优化任务,该任务要求最大程度地减少阶段损失功能,但要受离散的设定构成。尽管已经提出了许多培训方法,但现有的DNN进行全面量化的研究主要是经验的。从理论的角度来看,我们研究了克服网络量化的组合性质的实用技术。具体而言,我们研究了一种简单而功能强大的类似梯度的算法,用于量化两线性层网络,该算法通过在否定启发式\ emph {face emph {face}损失函数(所谓的粗梯度)的启发式\ emph {face emph {face emph {face emph {face emph <emph}时,以量化权重评估。我们首次证明,在轻度条件下,量化权重的顺序反复访问了训练完全量化网络的离散最小化问题的全局最佳。我们还显示了训练中重量演化的复发现象的数值证据,量化了深层网络。
Deep neural networks (DNNs) are quantized for efficient inference on resource-constrained platforms. However, training deep learning models with low-precision weights and activations involves a demanding optimization task, which calls for minimizing a stage-wise loss function subject to a discrete set-constraint. While numerous training methods have been proposed, existing studies for full quantization of DNNs are mostly empirical. From a theoretical point of view, we study practical techniques for overcoming the combinatorial nature of network quantization. Specifically, we investigate a simple yet powerful projected gradient-like algorithm for quantizing two-linear-layer networks, which proceeds by repeatedly moving one step at float weights in the negation of a heuristic \emph{fake} gradient of the loss function (so-called coarse gradient) evaluated at quantized weights. For the first time, we prove that under mild conditions, the sequence of quantized weights recurrently visits the global optimum of the discrete minimization problem for training fully quantized network. We also show numerical evidence of the recurrence phenomenon of weight evolution in training quantized deep networks.