论文标题

SGD的动量缩放的影响

Effects of momentum scaling for SGD

论文作者

Pasechnyuk, Dmitry A., Gasnikov, Alexander, Takáč, Martin

论文摘要

本文研究了与预处理的随机梯度方法的特性。我们专注于具有动量系数$β$的动量更新预处理。为了解释缩放方法的实用效率,我们提供了与预处理相关的规范的收敛分析,并证明缩放率使人们可以摆脱梯度Lipschitz的融合率常数。一路上,我们强调了$β$的重要作用,在各种作者的任意性下,不当之地设置为$ 0.99 ... 9 $。最后,我们提出了自适应$β$和步长值的明确建设性公式。

The paper studies the properties of stochastic gradient methods with preconditioning. We focus on momentum updated preconditioners with momentum coefficient $β$. Seeking to explain practical efficiency of scaled methods, we provide convergence analysis in a norm associated with preconditioner, and demonstrate that scaling allows one to get rid of gradients Lipschitz constant in convergence rates. Along the way, we emphasize important role of $β$, undeservedly set to constant $0.99...9$ at the arbitrariness of various authors. Finally, we propose the explicit constructive formulas for adaptive $β$ and step size values.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源