用受限的学习参数培训深层神经网络

论文标题

用受限的学习参数培训深层神经网络

Training Deep Neural Networks with Constrained Learning Parameters

论文作者

Date, Prasanna, Carothers, Christopher D., Mitchell, John E., Hendler, James A., Magdon-Ismail, Malik

论文摘要

当今的深度学习模型主要是在CPU和GPU上培训的。尽管这些模型往往误差较低，但由于双精度浮点学习参数，它们会消耗高功率并使用大量内存。除了摩尔定律外，深度学习任务的很大一部分还将在边缘计算系统上运行，这将构成整个计算结构中必不可少的一部分。随后，必须对此类系统的深度学习模型进行量身定制和采用，以生成具有以下理想特征的模型：低误差，低内存和低功率。我们认为，深度神经网络（DNN），在学习参数被限制为具有一组有限的离散值的情况下，在神经形态计算系统上运行将对具有这些理想特征的智能边缘计算系统具有重要作用。在此范围内，我们提出了组合神经网络训练算法（CONNTRA），该算法利用有限的离散学习参数来利用基于坐标梯度下降的方法来训练深度学习模型。接下来，我们详细介绍了理论基础，并评估了连接的计算复杂性。作为概念证明，我们使用Conntra在MNIST，IRIS和IMAGENET数据集上使用三元学习参数来训练深度学习模型，并将其性能与使用反向传播训练的相同模型进行比较。我们使用以下性能指标进行比较：（i）训练错误；（ii）验证错误；（iii）内存使用；（iv）培训时间。我们的结果表明，Conntra模型使用的内存少了32倍，并且与返回模型的错误存在错误。

Today's deep learning models are primarily trained on CPUs and GPUs. Although these models tend to have low error, they consume high power and utilize large amount of memory owing to double precision floating point learning parameters. Beyond the Moore's law, a significant portion of deep learning tasks would run on edge computing systems, which will form an indispensable part of the entire computation fabric. Subsequently, training deep learning models for such systems will have to be tailored and adopted to generate models that have the following desirable characteristics: low error, low memory, and low power. We believe that deep neural networks (DNNs), where learning parameters are constrained to have a set of finite discrete values, running on neuromorphic computing systems would be instrumental for intelligent edge computing systems having these desirable characteristics. To this extent, we propose the Combinatorial Neural Network Training Algorithm (CoNNTrA), that leverages a coordinate gradient descent-based approach for training deep learning models with finite discrete learning parameters. Next, we elaborate on the theoretical underpinnings and evaluate the computational complexity of CoNNTrA. As a proof of concept, we use CoNNTrA to train deep learning models with ternary learning parameters on the MNIST, Iris and ImageNet data sets and compare their performance to the same models trained using Backpropagation. We use following performance metrics for the comparison: (i) Training error; (ii) Validation error; (iii) Memory usage; and (iv) Training time. Our results indicate that CoNNTrA models use 32x less memory and have errors at par with the Backpropagation models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题