论文标题
概括超出加性扰动的通用对抗攻击
Generalizing Universal Adversarial Attacks Beyond Additive Perturbations
论文作者
论文摘要
先前的研究表明,通用的对抗性攻击可以在具有单一人类无关扰动的大量输入图像上欺骗深层神经网络。但是,通用对抗攻击的当前方法基于加性扰动,当将扰动直接添加到输入图像中时会导致错误分类。在本文中,我们首次表明,也可以通过非加性扰动(例如,空间变换)来实现普遍的对抗攻击。更重要的是,要统一添加剂和非加性扰动,我们为普遍的对抗性攻击(称为GUAP)提出了一个新颖的统一但灵活的框架,该框架能够通过加性扰动,非增强扰动或两者的组合来发起攻击。在CIFAR-10和Imagenet数据集上进行了广泛的实验,其中包括六个深神经网络模型,包括GoogleLeLenet,VGG16/19,Resnet101/152和Densenet121。经验实验表明,GUAP可以在CIFAR-10和Imagenet数据集上获得多达90.9%和99.24%的成功攻击率,这比当前最新的通用对抗性攻击分别提高了15%和19%的改善。在本文中复制实验的代码可在https://github.com/trustai/guap上找到。
The previous study has shown that universal adversarial attacks can fool deep neural networks over a large set of input images with a single human-invisible perturbation. However, current methods for universal adversarial attacks are based on additive perturbation, which cause misclassification when the perturbation is directly added to the input images. In this paper, for the first time, we show that a universal adversarial attack can also be achieved via non-additive perturbation (e.g., spatial transformation). More importantly, to unify both additive and non-additive perturbations, we propose a novel unified yet flexible framework for universal adversarial attacks, called GUAP, which is able to initiate attacks by additive perturbation, non-additive perturbation, or the combination of both. Extensive experiments are conducted on CIFAR-10 and ImageNet datasets with six deep neural network models including GoogleLeNet, VGG16/19, ResNet101/152, and DenseNet121. The empirical experiments demonstrate that GUAP can obtain up to 90.9% and 99.24% successful attack rates on CIFAR-10 and ImageNet datasets, leading to over 15% and 19% improvements respectively than current state-of-the-art universal adversarial attacks. The code for reproducing the experiments in this paper is available at https://github.com/TrustAI/GUAP.