解剖gan的自适应方法

论文标题

解剖gan的自适应方法

Dissecting adaptive methods in GANs

论文作者

Jelassi, Samy, Dobre, David, Mensch, Arthur, Li, Yuanzhi, Gidel, Gauthier

论文摘要

自适应方法是广泛用于训练生成对抗网络（GAN）的关键组件。尽管已经进行了一些工作来确定标准任务中的“自适应方法的边际价值”，但尚不清楚为什么它们对于GAN培训仍然至关重要。在本文中，我们正式研究了自适应方法如何帮助培训甘斯。受ARXIV中提出的嫁接方法的启发：2002.11803 [cs.lg]，我们将ADAM更新的幅度和方向组件分开，并分别将它们嫁接到SGDA更新的方向和大小。通过考虑使用Adam更新的大小和SGD的标准方向的更新规则，我们从经验上表明，Adam的适应性幅度是GAN训练的关键。这激发了我们在GAN训练的背景下仔细研究标准化的随机梯度下降（NSGDA）方法。我们提出了一个合成理论框架，以比较NSGDA和SGDA的性能在神经网络中进行GAN培训。我们证明，在这种情况下，接受NSGDA训练的GAN恢复了真实分布的所有模式，而接受SGDA（以及任何学习率配置）培训的网络都遭受模式崩溃的影响。我们分析中的关键见解是使梯度正常化迫使歧视器和发电机以相同的速度更新。我们还通过实验表明，对于几个数据集，可以使用NSGDA方法恢复Adam的性能。

Adaptive methods are a crucial component widely used for training generative adversarial networks (GANs). While there has been some work to pinpoint the "marginal value of adaptive methods" in standard tasks, it remains unclear why they are still critical for GAN training. In this paper, we formally study how adaptive methods help train GANs; inspired by the grafting method proposed in arXiv:2002.11803 [cs.LG], we separate the magnitude and direction components of the Adam updates, and graft them to the direction and magnitude of SGDA updates respectively. By considering an update rule with the magnitude of the Adam update and the normalized direction of SGD, we empirically show that the adaptive magnitude of Adam is key for GAN training. This motivates us to have a closer look at the class of normalized stochastic gradient descent ascent (nSGDA) methods in the context of GAN training. We propose a synthetic theoretical framework to compare the performance of nSGDA and SGDA for GAN training with neural networks. We prove that in that setting, GANs trained with nSGDA recover all the modes of the true distribution, whereas the same networks trained with SGDA (and any learning rate configuration) suffer from mode collapse. The critical insight in our analysis is that normalizing the gradients forces the discriminator and generator to be updated at the same pace. We also experimentally show that for several datasets, Adam's performance can be recovered with nSGDA methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题