论文标题
一种新型的对抗例子
A New Kind of Adversarial Example
论文作者
论文摘要
几乎所有对抗性攻击都是为了愚弄模型而添加不可察觉的扰动。在这里,我们认为相反的是可以欺骗人类而不是模型的对抗性例子。添加了足够大且可见的扰动,以使模型保持其原始决定,而人类很可能会被迫决定(或选择不决定)。现有的目标攻击可以重新构成以综合此类对抗性例子。我们提出的攻击被称为NKE,本质上与欺骗图像相似,但更有效,因为它使用梯度下降而不是进化算法。它还为对抗脆弱性问题提供了一种新的统一观点。对MNIST和CIFAR-10数据集的实验结果表明,我们的攻击在欺骗深层神经网络方面非常有效。代码可在https://github.com/aliborji/nke上找到。
Almost all adversarial attacks are formulated to add an imperceptible perturbation to an image in order to fool a model. Here, we consider the opposite which is adversarial examples that can fool a human but not a model. A large enough and perceptible perturbation is added to an image such that a model maintains its original decision, whereas a human will most likely make a mistake if forced to decide (or opt not to decide at all). Existing targeted attacks can be reformulated to synthesize such adversarial examples. Our proposed attack, dubbed NKE, is similar in essence to the fooling images, but is more efficient since it uses gradient descent instead of evolutionary algorithms. It also offers a new and unified perspective into the problem of adversarial vulnerability. Experimental results over MNIST and CIFAR-10 datasets show that our attack is quite efficient in fooling deep neural networks. Code is available at https://github.com/aliborji/NKE.