论文标题
Expectigrad:具有强大收敛性能的快速随机优化
Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties
论文作者
论文摘要
许多流行的自适应梯度方法,例如亚当和RMSPROP依靠指数移动平均线(EMA)来正常化其步骤。尽管EMA对新梯度信息的响应高度响应,但最近的研究表明,它也会在至少一个凸优化问题上引起分歧。我们提出了一种名为“ Expectigrad”的新颖方法,该方法根据所有历史梯度的人均未加权平均值调整得出,并计算分子和分母之间共同校正的偏见校正的动量项。我们证明,预期族在已知导致亚当分歧的优化问题的每一个实例上都无法分歧。我们还建立了在一般的随机非covex环境中的遗憾,表明与现有方法相比,期望族对梯度差异的影响较小。在几个高维机器学习任务上测试Expectigrad,我们发现它通常对最先进的方法进行了有利的性能,几乎没有高参数调整。
Many popular adaptive gradient methods such as Adam and RMSProp rely on an exponential moving average (EMA) to normalize their stepsizes. While the EMA makes these methods highly responsive to new gradient information, recent research has shown that it also causes divergence on at least one convex optimization problem. We propose a novel method called Expectigrad, which adjusts stepsizes according to a per-component unweighted mean of all historical gradients and computes a bias-corrected momentum term jointly between the numerator and denominator. We prove that Expectigrad cannot diverge on every instance of the optimization problem known to cause Adam to diverge. We also establish a regret bound in the general stochastic nonconvex setting that suggests Expectigrad is less susceptible to gradient variance than existing methods are. Testing Expectigrad on several high-dimensional machine learning tasks, we find it often performs favorably to state-of-the-art methods with little hyperparameter tuning.