论文标题
单步对抗训练和辍学计划
Single-step Adversarial training with Dropout Scheduling
论文作者
论文摘要
深度学习模型已显示出在包括医学诊断和自动驾驶的各种计算机视觉应用中的令人印象深刻的性能。这些模型面临的主要问题之一是它们对对抗攻击的敏感性。意识到这一问题的重要性,越来越多的研究人员正在努力开发不受对抗攻击影响的强大模型。对抗训练方法显示了这一方向的有希望的结果。在对抗训练方案中,对模型进行了训练,并通过对抗样品进行了培训。快速而简单的方法(例如,单步梯度上升)用于生成对抗样本,以降低计算复杂性。结果表明,使用单步对抗训练方法训练的模型(使用非著作方法生成对抗样本)是伪造的。此外,模型的伪鲁棒性归因于梯度掩盖效应。但是,现有作品无法解释在单步对抗训练中何时以及为什么会发生梯度掩盖效应。在这项工作中,(i)我们表明,使用单步对抗训练方法训练的模型学会了以防止产生单步对手,这是由于模型在训练的初始阶段中过度合适,并且(ii)为了减轻这种效果,我们提出了一种单步训练方法,并提出了一种单步对手训练方法,并进行了辍学计划。与使用现有的单步对抗训练方法训练的模型不同,使用拟议的单步对抗训练方法训练的模型对单步和多步兵对抗性攻击都具有鲁棒性,并且性能与使用计算昂贵的多步兵对手训练方法培训的模型相当。
Deep learning models have shown impressive performance across a spectrum of computer vision applications including medical diagnosis and autonomous driving. One of the major concerns that these models face is their susceptibility to adversarial attacks. Realizing the importance of this issue, more researchers are working towards developing robust models that are less affected by adversarial attacks. Adversarial training method shows promising results in this direction. In adversarial training regime, models are trained with mini-batches augmented with adversarial samples. Fast and simple methods (e.g., single-step gradient ascent) are used for generating adversarial samples, in order to reduce computational complexity. It is shown that models trained using single-step adversarial training method (adversarial samples are generated using non-iterative method) are pseudo robust. Further, this pseudo robustness of models is attributed to the gradient masking effect. However, existing works fail to explain when and why gradient masking effect occurs during single-step adversarial training. In this work, (i) we show that models trained using single-step adversarial training method learn to prevent the generation of single-step adversaries, and this is due to over-fitting of the model during the initial stages of training, and (ii) to mitigate this effect, we propose a single-step adversarial training method with dropout scheduling. Unlike models trained using existing single-step adversarial training methods, models trained using the proposed single-step adversarial training method are robust against both single-step and multi-step adversarial attacks, and the performance is on par with models trained using computationally expensive multi-step adversarial training methods, in white-box and black-box settings.