适用于神经网络训练的自适应记忆多批量L-BFGS算法

论文标题

适用于神经网络训练的自适应记忆多批量L-BFGS算法

An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network Training

论文作者

Zocco, Federico, McLoone, Seán

论文摘要

受到平行实施基于批处理算法的可能性以及近似二阶信息可实现的加速融合的动机，近年来，BFGS算法的有限记忆版本引起了人们对大型神经网络培训问题的越来越多的关注。由于成本函数的形状通常不是二次的，并且仅在最小值的附近变得大约是二次的，因此在训练的初始阶段，即远离最小值时，l-BFGs使用二阶信息可能是不可靠的。因此，为了控制二阶信息的影响，随着培训的进行，我们提出了一种多批量L-BFGS算法，即MB-AM，通过通过基于开发的增长（DEV-INCREASE）方案实施逐步存储和使用曲率数据，从而逐渐增加了其对曲率信息的信任。使用六个判别建模基准问题，我们从经验上表明，MB-AM的收敛速度略高，并且平均而言，在训练MLP和CNN模型时，MB-AM的收敛速度比标准的多批量L-BFGS算法获得更好的解决方案。

Motivated by the potential for parallel implementation of batch-based algorithms and the accelerated convergence achievable with approximated second order information a limited memory version of the BFGS algorithm has been receiving increasing attention in recent years for large neural network training problems. As the shape of the cost function is generally not quadratic and only becomes approximately quadratic in the vicinity of a minimum, the use of second order information by L-BFGS can be unreliable during the initial phase of training, i.e. when far from a minimum. Therefore, to control the influence of second order information as training progresses, we propose a multi-batch L-BFGS algorithm, namely MB-AM, that gradually increases its trust in the curvature information by implementing a progressive storage and use of curvature data through a development-based increase (dev-increase) scheme. Using six discriminative modelling benchmark problems we show empirically that MB-AM has slightly faster convergence and, on average, achieves better solutions than the standard multi-batch L-BFGS algorithm when training MLP and CNN models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题