论文标题
了解和改善快速的对抗训练
Understanding and Improving Fast Adversarial Training
论文作者
论文摘要
最近的工作重点是使对抗性培训在深度学习模型上在计算上有效。特别是Wong等。 (2020)表明,当模型迅速失去了单个训练时期的稳健性时,$ \ ell_ \ infty $ - 对抗训练(FGSM)可能会失败。我们表明,如Wong等人所提出的那样,将随机步骤添加到FGSM中。 (2020),并不能防止灾难性过度拟合,并且随机性本身并不重要,其主要作用只是减少扰动的大小。此外,我们表明,灾难性的过度拟合不是深层和过度透明的网络固有的,而是在带有几个过滤器的单层卷积网络中发生。在极端情况下,即使是单个过滤器也可以使网络在本地高度非线性,这是FGSM训练失败的主要原因。基于此观察结果,我们提出了一种新的正则化方法,即GradAlign,该方法通过明确最大化扰动集内的梯度比对来防止灾难性的过度拟合,并提高了FGSM解决方案的质量。结果,GradAlign允许成功对较大的$ \ ell_ \ infty $ pertertations进行FGSM培训,并减少多步对抗训练的差距。我们的实验代码可从https://github.com/tml-epfl/understanding-fast-adv-training获得。
A recent line of work focused on making adversarial training computationally efficient for deep learning models. In particular, Wong et al. (2020) showed that $\ell_\infty$-adversarial training with fast gradient sign method (FGSM) can fail due to a phenomenon called "catastrophic overfitting", when the model quickly loses its robustness over a single epoch of training. We show that adding a random step to FGSM, as proposed in Wong et al. (2020), does not prevent catastrophic overfitting, and that randomness is not important per se -- its main role being simply to reduce the magnitude of the perturbation. Moreover, we show that catastrophic overfitting is not inherent to deep and overparametrized networks, but can occur in a single-layer convolutional network with a few filters. In an extreme case, even a single filter can make the network highly non-linear locally, which is the main reason why FGSM training fails. Based on this observation, we propose a new regularization method, GradAlign, that prevents catastrophic overfitting by explicitly maximizing the gradient alignment inside the perturbation set and improves the quality of the FGSM solution. As a result, GradAlign allows to successfully apply FGSM training also for larger $\ell_\infty$-perturbations and reduce the gap to multi-step adversarial training. The code of our experiments is available at https://github.com/tml-epfl/understanding-fast-adv-training.