论文标题
AEGD:能量的自适应梯度下降
AEGD: Adaptive Gradient Descent with Energy
论文作者
论文摘要
我们建议基于动态更新的能量变量,提出了一种用于基于一阶梯度的非凸目标功能优化的新算法。该方法被证明是无条件的能量稳定,而不论步骤尺寸如何。我们证明了非凸和凸目标的AEGD的能量依赖性收敛速率,对于较小的步骤,这可以恢复批处理梯度下降所需的收敛速率。我们还提供了在随机非凸设置中AEGD固定收敛的能量依赖性结合。该方法很容易实现,几乎不需要对超参数调整。实验结果表明,AEGD可以很好地解决各种优化问题:相对于初始数据,它可以稳健,能够快速初始进步。随机AEGD比SGD表现出可比性,通常更好的概括性能,而SGD具有深层神经网络的动量。
We propose AEGD, a new algorithm for first-order gradient-based optimization of non-convex objective functions, based on a dynamically updated energy variable. The method is shown to be unconditionally energy stable, irrespective of the step size. We prove energy-dependent convergence rates of AEGD for both non-convex and convex objectives, which for a suitably small step size recovers desired convergence rates for the batch gradient descent. We also provide an energy-dependent bound on the stationary convergence of AEGD in the stochastic non-convex setting. The method is straightforward to implement and requires little tuning of hyper-parameters. Experimental results demonstrate that AEGD works well for a large variety of optimization problems: it is robust with respect to initial data, capable of making rapid initial progress. The stochastic AEGD shows comparable and often better generalization performance than SGD with momentum for deep neural networks.