防压后门对深神经网络的攻击

论文标题

防压后门对深神经网络的攻击

Compression-Resistant Backdoor Attack against Deep Neural Networks

论文作者

Xue, Mingfu, Wang, Xin, Sun, Shichang, Zhang, Yushu, Wang, Jian, Liu, Weiqiang

论文摘要

近年来，已经提出了许多基于培训数据中毒的后门攻击。但是，实际上，这些后门攻击容易受到图像压缩的影响。当压缩后门实例时，特定后门触发器的特征将被破坏，这可能导致后门攻击性能恶化。在本文中，我们提出了基于功能一致性训练的抗压后门攻击。据我们所知，这是第一次可用于图像压缩的后门攻击。首先，两个后门图像及其压缩版本都输入了深神经网络（DNN）进行培训。然后，每个图像的特征是通过DNN的内部层提取的。接下来，将后门图像及其压缩版本之间的特征差异最小化。结果，DNN将压缩图像的特征视为功能空间中后门图像的特征。训练后，针对DNN的后门攻击对于图像压缩是可靠的。此外，我们考虑了特征一致性训练中的三种不同的图像压缩（即JPEG，JPEG2000，WebP），因此后门攻击对多个图像压缩算法具有鲁棒性。实验结果证明了拟议的后门攻击的有效性和鲁棒性。当压缩后门实例时，常见后门攻击的攻击成功率低于10％，而我们抗压后门的攻击成功率大于97％。即使在压缩质量低下的后门图像被压缩时，抗压攻击仍然很强。此外，广泛的实验表明，我们的抗压后门攻击具有抵抗图像压缩的概括能力，而训练过程中未使用的图像压缩能力。

In recent years, many backdoor attacks based on training data poisoning have been proposed. However, in practice, those backdoor attacks are vulnerable to image compressions. When backdoor instances are compressed, the feature of specific backdoor trigger will be destroyed, which could result in the backdoor attack performance deteriorating. In this paper, we propose a compression-resistant backdoor attack based on feature consistency training. To the best of our knowledge, this is the first backdoor attack that is robust to image compressions. First, both backdoor images and their compressed versions are input into the deep neural network (DNN) for training. Then, the feature of each image is extracted by internal layers of the DNN. Next, the feature difference between backdoor images and their compressed versions are minimized. As a result, the DNN treats the feature of compressed images as the feature of backdoor images in feature space. After training, the backdoor attack against DNN is robust to image compression. Furthermore, we consider three different image compressions (i.e., JPEG, JPEG2000, WEBP) in feature consistency training, so that the backdoor attack is robust to multiple image compression algorithms. Experimental results demonstrate the effectiveness and robustness of the proposed backdoor attack. When the backdoor instances are compressed, the attack success rate of common backdoor attack is lower than 10%, while the attack success rate of our compression-resistant backdoor is greater than 97%. The compression-resistant attack is still robust even when the backdoor images are compressed with low compression quality. In addition, extensive experiments have demonstrated that, our compression-resistant backdoor attack has the generalization ability to resist image compression which is not used in the training process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题