论文标题
在数字图像上的视觉上不察觉的对抗贴片攻击
Visually Imperceptible Adversarial Patch Attacks on Digital Images
论文作者
论文摘要
深神经网络(DNN)对对抗性例子的脆弱性吸引了更多的关注。已经提出了许多算法来制作强大的对抗性例子。但是,这些算法中的大多数都在不考虑网络解释的情况下修改了像素的全局或局部区域。因此,扰动是多余的,它很容易被人的眼睛检测到。在本文中,我们提出了一种新的方法来产生局部区域扰动。主要思想是通过模拟人体注意机制找到图像的贡献特征区域(CFR),然后在CFR中添加扰动。此外,根据激活图设计了一个软掩模矩阵,以精细表示CFR中每个像素的贡献。使用此软面膜,我们开发出一个新的损耗函数,并具有反向温度,以搜索CFR中的最佳扰动。由于网络解释,添加到CFR的扰动比添加到其他区域的扰动更有效。在CIFAR-10和ILSVRC2012上进行的广泛实验证明了该方法的有效性,包括攻击成功率,不可识别性和可传递性。
The vulnerability of deep neural networks (DNNs) to adversarial examples has attracted more attention. Many algorithms have been proposed to craft powerful adversarial examples. However, most of these algorithms modified the global or local region of pixels without taking network explanations into account. Hence, the perturbations are redundant, which are easily detected by human eyes. In this paper, we propose a novel method to generate local region perturbations. The main idea is to find a contributing feature region (CFR) of an image by simulating the human attention mechanism and then add perturbations to CFR. Furthermore, a soft mask matrix is designed on the basis of an activation map to finely represent the contributions of each pixel in CFR. With this soft mask, we develop a new loss function with inverse temperature to search for optimal perturbations in CFR. Due to the network explanations, the perturbations added to CFR are more effective than those added to other regions. Extensive experiments conducted on CIFAR-10 and ILSVRC2012 demonstrate the effectiveness of the proposed method, including attack success rate, imperceptibility, and transferability.