MLR-SNET：可转移的LR时间表，用于异质任务

论文标题

MLR-SNET：可转移的LR时间表，用于异质任务

MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks

论文作者

Shu, Jun, Zhu, Yanwen, Zhao, Qian, Xu, Zongben, Meng, Deyu

论文摘要

学习率（LR）是用于训练深神经网络（DNN）的随机梯度下降（SGD）算法中最重要的超参数之一。但是，当前的手工设计的LR计划需要手动预定固定形式，这限制了由于训练动力学的显着多元化而适应实用的非凸优化问题的能力。同时，它始终需要从头开始搜索适当的LR计划以查找新任务，但是，这些任务通常与任务变化（例如数据模式，网络架构或培训数据能力）大致不同。为了解决此学习率 - 安排设置问题，我们建议使用称为\ textit {mlr-snet}的明确映射公式来参数化LR计划。可学习的参数化结构为MLR-SNET提供了更大的灵活性，以学习适当的LR时间表以符合DNN的训练动力学。图像和文本分类基准实验证实了我们方法实现适当LR计划的能力。此外，明确的参数化结构使元学习的LR计划能够转移和插件，可以轻松地将其推广到新的异质任务。我们将元学习的MLR-SNET转移到查询任务中，例如不同的培训时期，网络体系结构，数据模式，数据集大小，从培训中的大小，并与专门为查询任务设计的手工设计的LR计划相比，实现了可比甚至更好的性能。当训练数据偏向损坏的噪声时，MLR-SNET的鲁棒性也得到证实。我们进一步证明了配备了MLR-NET生成的LR计划的SGD算法的收敛性，其收敛速率与解决该问题的算法最著名的算法相当。

The learning rate (LR) is one of the most important hyper-parameters in stochastic gradient descent (SGD) algorithm for training deep neural networks (DNN). However, current hand-designed LR schedules need to manually pre-specify a fixed form, which limits their ability to adapt practical non-convex optimization problems due to the significant diversification of training dynamics. Meanwhile, it always needs to search proper LR schedules from scratch for new tasks, which, however, are often largely different with task variations, like data modalities, network architectures, or training data capacities. To address this learning-rate-schedule setting issues, we propose to parameterize LR schedules with an explicit mapping formulation, called \textit{MLR-SNet}. The learnable parameterized structure brings more flexibility for MLR-SNet to learn a proper LR schedule to comply with the training dynamics of DNN. Image and text classification benchmark experiments substantiate the capability of our method for achieving proper LR schedules. Moreover, the explicit parameterized structure makes the meta-learned LR schedules capable of being transferable and plug-and-play, which can be easily generalized to new heterogeneous tasks. We transfer our meta-learned MLR-SNet to query tasks like different training epochs, network architectures, data modalities, dataset sizes from the training ones, and achieve comparable or even better performance compared with hand-designed LR schedules specifically designed for the query tasks. The robustness of MLR-SNet is also substantiated when the training data are biased with corrupted noise. We further prove the convergence of the SGD algorithm equipped with LR schedule produced by our MLR-Net, with the convergence rate comparable to the best-known ones of the algorithm for solving the problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题