论文标题

针对对抗性攻击的补丁++扰动

Patch-wise++ Perturbation for Adversarial Targeted Attacks

论文作者

Gao, Lianli, Zhang, Qilong, Song, Jingkuan, Shen, Heng Tao

论文摘要

尽管对深度神经网络(DNN)的对抗性攻击取得了巨大进展,但它们的可传递性仍然不令人满意,尤其是针对目标攻击。长期以来一直忽略了背后有两个问题:1)$ε/t $的$ t $迭代的常规设置,以符合$ε$ -constraint。在这种情况下,大多数像素被允许添加非常小的噪音,远小于$ε$; 2)通常操纵像素的噪声。但是,由DNN提取的像素的特征受周围区域的影响,并且不同的DNN通常集中在识别中的不同歧视区域上。为了解决这些问题,我们以前的工作提出了一种通过斑块迭代方法(PIM),旨在制作具有高传递性的对抗性示例。具体来说,我们将一个放大因子引入了每次迭代的步长,并且一个像素的整体梯度溢出了$ε$ -constraint被项目内核正确分配给其周围区域。但是,有针对性的攻击旨在将对抗性示例推向特定类别的领域,并且扩增因子可能导致拟合不足。因此,我们引入温度,并提出一种贴片++迭代方法(PIM ++),以进一步提高可转移性,而无需显着牺牲白盒攻击的性能。我们的方法通常可以集成到任何基于梯度的攻击方法中。与当前的最新攻击方法相比,国防模型的成功率显着提高了33.1 \%,平均训练有素的模型为31.4 \%。

Although great progress has been made on adversarial attacks for deep neural networks (DNNs), their transferability is still unsatisfactory, especially for targeted attacks. There are two problems behind that have been long overlooked: 1) the conventional setting of $T$ iterations with the step size of $ε/T$ to comply with the $ε$-constraint. In this case, most of the pixels are allowed to add very small noise, much less than $ε$; and 2) usually manipulating pixel-wise noise. However, features of a pixel extracted by DNNs are influenced by its surrounding regions, and different DNNs generally focus on different discriminative regions in recognition. To tackle these issues, our previous work proposes a patch-wise iterative method (PIM) aimed at crafting adversarial examples with high transferability. Specifically, we introduce an amplification factor to the step size in each iteration, and one pixel's overall gradient overflowing the $ε$-constraint is properly assigned to its surrounding regions by a project kernel. But targeted attacks aim to push the adversarial examples into the territory of a specific class, and the amplification factor may lead to underfitting. Thus, we introduce the temperature and propose a patch-wise++ iterative method (PIM++) to further improve transferability without significantly sacrificing the performance of the white-box attack. Our method can be generally integrated to any gradient-based attack methods. Compared with the current state-of-the-art attack methods, we significantly improve the success rate by 33.1\% for defense models and 31.4\% for normally trained models on average.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源