论文标题
具有能量和动量的自适应梯度方法
An Adaptive Gradient Method with Energy and Momentum
论文作者
论文摘要
我们引入了一种新型算法,用于基于梯度的随机目标函数的优化。该方法可以看作是具有自适应学习率的动量的SGD变体,该变量会自动通过“能量”变量调整。该方法易于实施,计算上有效,并且非常适合大规模的机器学习问题。该方法在基本学习率的任何规模上表现出无条件的能量稳定性。我们对在线凸优化框架下的收敛速率产生了遗憾。我们还建立了算法的能量依赖性收敛速率,即在随机非凸设置中的固定点。此外,提供了足够的条件,以确保能量变量的正阈值。我们的实验表明,该算法在训练深层神经网络中的动力上比或SGD更好地收敛,同时概括了或更高的SGD,并且与ADAM的比较也很受欢迎。
We introduce a novel algorithm for gradient-based optimization of stochastic objective functions. The method may be seen as a variant of SGD with momentum equipped with an adaptive learning rate automatically adjusted by an 'energy' variable. The method is simple to implement, computationally efficient, and well suited for large-scale machine learning problems. The method exhibits unconditional energy stability for any size of the base learning rate. We provide a regret bound on the convergence rate under the online convex optimization framework. We also establish the energy-dependent convergence rate of the algorithm to a stationary point in the stochastic non-convex setting. In addition, a sufficient condition is provided to guarantee a positive lower threshold for the energy variable. Our experiments demonstrate that the algorithm converges fast while generalizing better than or as well as SGD with momentum in training deep neural networks, and compares also favorably to Adam.