通过反向对抗扰动提高对抗性攻击的可转移性

论文标题

通过反向对抗扰动提高对抗性攻击的可转移性

Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation

论文作者

Qin, Zeyu, Fan, Yanbo, Liu, Yi, Shen, Li, Zhang, Yong, Wang, Jue, Wu, Baoyuan

论文摘要

深度神经网络（DNN）已被证明容易受到对抗性示例的影响，这可以通过注入不可察觉的扰动来产生错误的预测。在这项工作中，我们研究了对抗性示例的可传递性，这很重要，因为它对模型架构或参数通常未知的现实应用程序的威胁。许多现有的作品表明，对抗性示例可能会过分从它们中产生的替代模型，从而将其转移攻击性能限制在不同目标模型上。为了减轻替代模型的过度拟合，我们提出了一种新颖的攻击方法，称为反向对抗扰动（RAP）。具体而言，我们主张在优化过程的每个步骤中注入最坏情况下的扰动（反向对手扰动），而不是最大程度地减少单个对抗点的损失，而是主张寻求位于具有统一低损耗的区域的对抗示例。用RAP的对抗性攻击被称为Min-Max Bi级优化问题。通过将RAP集成到攻击的迭代过程中，我们的方法可以找到更稳定的对抗示例，这些示例对决策边界的变化不太敏感，从而减轻了替代模型的过度拟合。全面的实验比较表明，RAP可以显着提高对抗性转移性。此外，RAP可以自然与许多现有的黑盒攻击技术相结合，以进一步提高转移性。在攻击现实世界的图像识别系统，Google Cloud Vision API时，我们可以获得对比较方法的目标攻击的22％的性能提高。我们的代码可在https://github.com/sclbd/transfer_attack_rap上找到。

Deep neural networks (DNNs) have been shown to be vulnerable to adversarial examples, which can produce erroneous predictions by injecting imperceptible perturbations. In this work, we study the transferability of adversarial examples, which is significant due to its threat to real-world applications where model architecture or parameters are usually unknown. Many existing works reveal that the adversarial examples are likely to overfit the surrogate model that they are generated from, limiting its transfer attack performance against different target models. To mitigate the overfitting of the surrogate model, we propose a novel attack method, dubbed reverse adversarial perturbation (RAP). Specifically, instead of minimizing the loss of a single adversarial point, we advocate seeking adversarial example located at a region with unified low loss value, by injecting the worst-case perturbation (the reverse adversarial perturbation) for each step of the optimization procedure. The adversarial attack with RAP is formulated as a min-max bi-level optimization problem. By integrating RAP into the iterative process for attacks, our method can find more stable adversarial examples which are less sensitive to the changes of decision boundary, mitigating the overfitting of the surrogate model. Comprehensive experimental comparisons demonstrate that RAP can significantly boost adversarial transferability. Furthermore, RAP can be naturally combined with many existing black-box attack techniques, to further boost the transferability. When attacking a real-world image recognition system, Google Cloud Vision API, we obtain 22% performance improvement of targeted attacks over the compared method. Our codes are available at https://github.com/SCLBD/Transfer_attack_RAP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题