跳过重量更新：减少人工神经网络的培训时间

论文标题

跳过重量更新：减少人工神经网络的培训时间

Weight Update Skipping: Reducing Training Time for Artificial Neural Networks

论文作者

Safayenikoo, Pooneh, Akturk, Ismail

论文摘要

人工神经网络（ANN）被称为机器学习（ML）的最先进技术，并在数据密集型应用程序（例如识别，分类和分割）中取得了出色的结果。这些网络主要使用卷积或完全连接的图层的深层，每层有许多过滤器，要求大量数据和可调的超参数以达到竞争精度。结果，培训的存储，通信和计算成本（特别是培训时间）成为限制性因素以扩大它们。在本文中，我们为ANN提出了一种新的培训方法，以利用准确性的提高的观察显示时间变化，使我们可以在变化微小时跳过更新权重。在此时间窗口中，我们不断更新偏差，以确保网络仍然训练并避免过度拟合；但是，我们有选择地跳过更新权重（及其耗时的计算）。这种训练方法实际上可以实现相同的准确性，而计算成本却大大降低，因此培训时间较小。我们提出了两种更新权重的方法，并通过分析四种最先进的模型，Alexnet，VGG-11，VGG-11，VGG-16，RESNET-18。平均而言，我们提出的两种称为WUS和WUS+LR的方法将训练时间（与基线相比）分别减少了54％，在CIFAR-10上分别减少了50％。 CIFAR-100分别为43％和35％。

Artificial Neural Networks (ANNs) are known as state-of-the-art techniques in Machine Learning (ML) and have achieved outstanding results in data-intensive applications, such as recognition, classification, and segmentation. These networks mostly use deep layers of convolution or fully connected layers with many filters in each layer, demanding a large amount of data and tunable hyperparameters to achieve competitive accuracy. As a result, storage, communication, and computational costs of training (in particular training time) become limiting factors to scale them up. In this paper, we propose a new training methodology for ANNs that exploits the observation of improvement of accuracy shows temporal variations which allow us to skip updating weights when the variation is minuscule. During such time windows, we keep updating bias which ensures the network still trains and avoids overfitting; however, we selectively skip updating weights (and their time-consuming computations). Such a training approach virtually achieves the same accuracy with considerably less computational cost, thus lower training time. We propose two methods for updating weights and evaluate them by analyzing four state-of-the-art models, AlexNet, VGG-11, VGG-16, ResNet-18 on CIFAR datasets. On average, our two proposed methods called WUS and WUS+LR reduced the training time (compared to the baseline) by 54%, and 50%, respectively on CIFAR-10; and 43% and 35% on CIFAR-100, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题