论文标题
实用黑盒攻击的无数据对抗扰动
Data-Free Adversarial Perturbations for Practical Black-Box Attack
论文作者
论文摘要
神经网络容易受到对抗性例子的影响,这些例子是为欺骗预训练的模型而设计的恶意输入。对抗性示例通常表现出黑盒攻击的转移性,这使得为一个模型制作的对抗性例子可以欺骗另一个模型。但是,现有的黑盒攻击方法需要训练数据分布中的样本,以提高不同模型中对抗性示例的可传递性。由于数据依赖性,只有在访问训练数据时,对抗扰动的愚弄能力才适用。在本文中,我们提出了一种无数据的方法来制定对抗性扰动,该方法可以欺骗目标模型,而无需了解培训数据分布。在黑框攻击方案的实际环境中,攻击者无法访问目标模型和培训数据,我们的方法在目标模型上实现了高愚弄率,并且胜过其他通用的对抗性扰动方法。从经验上讲,我们的方法表明,即使攻击者无法访问培训数据,当前的深度学习模型仍处于危险之中。
Neural networks are vulnerable to adversarial examples, which are malicious inputs crafted to fool pre-trained models. Adversarial examples often exhibit black-box attacking transferability, which allows that adversarial examples crafted for one model can fool another model. However, existing black-box attack methods require samples from the training data distribution to improve the transferability of adversarial examples across different models. Because of the data dependence, the fooling ability of adversarial perturbations is only applicable when training data are accessible. In this paper, we present a data-free method for crafting adversarial perturbations that can fool a target model without any knowledge about the training data distribution. In the practical setting of a black-box attack scenario where attackers do not have access to target models and training data, our method achieves high fooling rates on target models and outperforms other universal adversarial perturbation methods. Our method empirically shows that current deep learning models are still at risk even when the attackers do not have access to training data.