重复使用经过训练的卷积神经网络层以缩短超参数调整时间

论文标题

重复使用经过训练的卷积神经网络层以缩短超参数调整时间

Reusing Trained Layers of Convolutional Neural Networks to Shorten Hyperparameters Tuning Time

论文作者

Castro, Roberto L., Andrade, Diego, Fraguela, Basilio

论文摘要

超参数调整是一种耗时的方法，尤其是当神经网络的体系结构作为此过程的一部分时。例如，在卷积神经网络（CNN）中，可以确定隐藏（卷积）层的数量和特征的选择。这意味着搜索过程涉及对所有这些候选网络体系结构的培训。本文介绍了一项建议，以重复不同培训中隐藏（卷积）层的权重以缩短此过程。理由是，如果已经对一组卷积层进行了培训以解决给定的问题，那么当将新的卷积层添加到网络体系结构中时，本培训中计算出的权重可能很有用。该想法已通过CIFAR-10数据集进行了测试，测试了最多3个卷积层和最多3个完全连接的层的不同CNN架构。实验比较了训练时间和验证损失，而不是重复使用卷积层。他们确认该策略减少了训练时间，同时它甚至提高了所得神经网络的准确性。这一发现为将该策略集成到现有的AutoML方法中的目的是为了减少总搜索时间。

Hyperparameters tuning is a time-consuming approach, particularly when the architecture of the neural network is decided as part of this process. For instance, in convolutional neural networks (CNNs), the selection of the number and the characteristics of the hidden (convolutional) layers may be decided. This implies that the search process involves the training of all these candidate network architectures. This paper describes a proposal to reuse the weights of hidden (convolutional) layers among different trainings to shorten this process. The rationale is that if a set of convolutional layers have been trained to solve a given problem, the weights calculated in this training may be useful when a new convolutional layer is added to the network architecture. This idea has been tested using the CIFAR-10 dataset, testing different CNNs architectures with up to 3 convolutional layers and up to 3 fully connected layers. The experiments compare the training time and the validation loss when reusing and not reusing convolutional layers. They confirm that this strategy reduces the training time while it even increases the accuracy of the resulting neural network. This finding opens up the future possibility of integrating this strategy in existing AutoML methods with the purpose of reducing the total search time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题