AEGD：能量的自适应梯度下降

论文标题

AEGD：能量的自适应梯度下降

AEGD: Adaptive Gradient Descent with Energy

论文作者

Liu, Hailiang, Tian, Xuping

论文摘要

我们建议基于动态更新的能量变量，提出了一种用于基于一阶梯度的非凸目标功能优化的新算法。该方法被证明是无条件的能量稳定，而不论步骤尺寸如何。我们证明了非凸和凸目标的AEGD的能量依赖性收敛速率，对于较小的步骤，这可以恢复批处理梯度下降所需的收敛速率。我们还提供了在随机非凸设置中AEGD固定收敛的能量依赖性结合。该方法很容易实现，几乎不需要对超参数调整。实验结果表明，AEGD可以很好地解决各种优化问题：相对于初始数据，它可以稳健，能够快速初始进步。随机AEGD比SGD表现出可比性，通常更好的概括性能，而SGD具有深层神经网络的动量。

We propose AEGD, a new algorithm for first-order gradient-based optimization of non-convex objective functions, based on a dynamically updated energy variable. The method is shown to be unconditionally energy stable, irrespective of the step size. We prove energy-dependent convergence rates of AEGD for both non-convex and convex objectives, which for a suitably small step size recovers desired convergence rates for the batch gradient descent. We also provide an energy-dependent bound on the stationary convergence of AEGD in the stochastic non-convex setting. The method is straightforward to implement and requires little tuning of hyper-parameters. Experimental results demonstrate that AEGD works well for a large variety of optimization problems: it is robust with respect to initial data, capable of making rapid initial progress. The stochastic AEGD shows comparable and often better generalization performance than SGD with momentum for deep neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题