关于适用于最小二乘的随机梯度下降的正则化作用

论文标题

关于适用于最小二乘的随机梯度下降的正则化作用

On the Regularization Effect of Stochastic Gradient Descent applied to Least Squares

论文作者

Steinerberger, Stefan

论文摘要

我们研究应用于$ \ | ax -b \ | _2^2 \ rightarrow \ min $的随机梯度下降的行为，用于可逆$ a \ in \ mathbb {r}^{r}^{n \ times n} $。我们表明，根据$ a $的$ \ mathbb {e}〜\ left \ | | ax_ {k + 1} -b \ right \ |^2_ {2} \ leq \ left（1 + \ frac {c_ {a}}} {\ | a \ | a \ | _f^2} \ oyt） \ frac {2} {\ | a \ | _f^2} \ left \ | a^t a（x_k- x）随机梯度下降导致快速正则化。对于对称矩阵，这种不平等具有向高阶Sobolev空间的扩展。这解释了一种（已知的）正则化现象：一个从大奇异值到小奇异值平滑的能量级联。

We study the behavior of stochastic gradient descent applied to $\|Ax -b \|_2^2 \rightarrow \min$ for invertible $A \in \mathbb{R}^{n \times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that $$ \mathbb{E} ~\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k -b \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T A (x_k - x)\right\|^2_{2}.$$ This is a curious inequality: the last term has one more matrix applied to the residual $u_k - u$ than the remaining terms: if $x_k - x$ is mainly comprised of large singular vectors, stochastic gradient descent leads to a quick regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values smoothes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题