在生长条件下加速随机梯度下降的收敛分析

论文标题

在生长条件下加速随机梯度下降的收敛分析

Convergence Analysis of Accelerated Stochastic Gradient Descent under the Growth Condition

论文作者

Chen, You-Lin, Na, Sen, Kolar, Mladen

论文摘要

我们研究了在生长条件下强烈凸出目标的加速随机梯度下降的收敛性，这表明随机梯度的方差受到随着完整梯度和恒定添加部分生长的乘法部分的界定。通过生长条件的镜头，我们研究了四种广泛使用的加速方法：Nesterov的加速方法（NAM），健壮动量方法（RMM），加速双重平均方法（DAM+）和隐式DAM+（IDAM+）。虽然已知这些方法在随机梯度具有有限方差的条件下提高了SGD的收敛速率，但尚不清楚其收敛速率如何受到多重噪声的影响。在本文中，我们表明这些方法都将融合速率加速的最佳邻居（与SGD相比），即使在生长条件下也是如此。特别是，NAM，RMM，IDAM+仅具有轻度的乘法噪声，而Dam+也可以加速加速，即使有大量的乘法噪声也可以加速。此外，我们提出了一个通用的尾部平均方案，该方案允许大坝+和IDAM+的加速速率几乎达到理论下限（在方差项中最多达到对数因素）。我们进行数值实验以支持我们的理论结论。

We study the convergence of accelerated stochastic gradient descent for strongly convex objectives under the growth condition, which states that the variance of stochastic gradient is bounded by a multiplicative part that grows with the full gradient, and a constant additive part. Through the lens of the growth condition, we investigate four widely used accelerated methods: Nesterov's accelerated method (NAM), robust momentum method (RMM), accelerated dual averaging method (DAM+), and implicit DAM+ (iDAM+). While these methods are known to improve the convergence rate of SGD under the condition that the stochastic gradient has bounded variance, it is not well understood how their convergence rates are affected by the multiplicative noise. In this paper, we show that these methods all converge to a neighborhood of the optimum with accelerated convergence rates (compared to SGD) even under the growth condition. In particular, NAM, RMM, iDAM+ enjoy acceleration only with a mild multiplicative noise, while DAM+ enjoys acceleration even with a large multiplicative noise. Furthermore, we propose a generic tail-averaged scheme that allows the accelerated rates of DAM+ and iDAM+ to nearly attain the theoretical lower bound (up to a logarithmic factor in the variance term). We conduct numerical experiments to support our theoretical conclusions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题