论文标题
关于梯度下降的收敛:严格的局部分析
On Convergence of Gradient Descent Ascent: A Tight Local Analysis
论文作者
论文摘要
梯度下降(GDA)方法是生成对抗网络(GAN)中最小值优化的主流算法。 GDA的收敛特性引起了最近文献的重大兴趣。具体而言,对于$ \ min _ {\ Mathbf {x}} \ max _ {\ MathBf {y}} f(\ MathBf {x}; \ \ m athbf {y})$ $ f $ in $ \ \ \ \ \ \ \ \ m athbf {y} $ and nonconve and lincevex in lincove和lin nonconvex in lin eet et \ concove Al。,2020)证明了GDA的收敛性,带有一个步骤比率$η_{\ MathBf {y}}}/η_{\ MathBf {X}} =θ(κ^2)$,其中$η_ $ \ mathbf {x} $和$ \ Mathbf {y} $和$κ$是$ \ Mathbf {y} $的条件号。尽管此步骤尺寸表明对最小玩家进行了缓慢的训练,但实用的GAN算法通常对两个变量采用类似的步骤,表明理论和经验结果之间存在较大差距。在本文中,我们的目标是通过分析常规\ emph {nonconvex-nonconcave} minimax问题的\ emph {local contergence}来弥合这一差距。我们证明,$θ(κ)$的步长比是必需的,足以足以将GDA局部收敛到Stackelberg平衡,其中$κ$是$ \ mathbf {y} $的本地条件号。我们证明,匹配的下限几乎是紧密的收敛速度。我们进一步将收敛保证扩展到随机GDA和额外梯度方法(例如)。最后,我们进行了几项数值实验,以支持我们的理论发现。
Gradient Descent Ascent (GDA) methods are the mainstream algorithms for minimax optimization in generative adversarial networks (GANs). Convergence properties of GDA have drawn significant interest in the recent literature. Specifically, for $\min_{\mathbf{x}} \max_{\mathbf{y}} f(\mathbf{x};\mathbf{y})$ where $f$ is strongly-concave in $\mathbf{y}$ and possibly nonconvex in $\mathbf{x}$, (Lin et al., 2020) proved the convergence of GDA with a stepsize ratio $η_{\mathbf{y}}/η_{\mathbf{x}}=Θ(κ^2)$ where $η_{\mathbf{x}}$ and $η_{\mathbf{y}}$ are the stepsizes for $\mathbf{x}$ and $\mathbf{y}$ and $κ$ is the condition number for $\mathbf{y}$. While this stepsize ratio suggests a slow training of the min player, practical GAN algorithms typically adopt similar stepsizes for both variables, indicating a wide gap between theoretical and empirical results. In this paper, we aim to bridge this gap by analyzing the \emph{local convergence} of general \emph{nonconvex-nonconcave} minimax problems. We demonstrate that a stepsize ratio of $Θ(κ)$ is necessary and sufficient for local convergence of GDA to a Stackelberg Equilibrium, where $κ$ is the local condition number for $\mathbf{y}$. We prove a nearly tight convergence rate with a matching lower bound. We further extend the convergence guarantees to stochastic GDA and extra-gradient methods (EG). Finally, we conduct several numerical experiments to support our theoretical findings.