分散的像素扰动基于基于图像分类器模型的不可察觉的后门触发器

论文标题

分散的像素扰动基于基于图像分类器模型的不可察觉的后门触发器

Dispersed Pixel Perturbation-based Imperceptible Backdoor Trigger for Image Classifier Models

论文作者

Wang, Yulong, Zhao, Minghui, Li, Shenghong, Yuan, Xin, Ni, Wei

论文摘要

典型的深神经网络（DNN）后门攻击基于嵌入在输入中的触发因素。现有的不可察觉的触发因素在计算上昂贵或攻击成功率低。在本文中，我们提出了一个新的后门扳机，该触发器易于生成，不可察觉且高效。新的触发器是一个均匀生成的三维（3D）二进制图案，可以水平和/或垂直重复和镜像，并将其超级置于三通道图像上，以训练后do的DNN模型。新型触发器分散在整个图像中，对单个像素产生弱扰动，但统称具有强大的识别模式来训练和激活DNN的后门。我们还分析表明，随着图像的分辨率提高，触发因素越来越有效。实验是使用MNIST，CIFAR-10和BTSR数据集上的RESNET-18和MLP模型进行的。就不可察觉而言，新触发的触发器优于现有的触发器，例如Badnets，Trojaned NN和隐藏的后门。新的触发器达到了几乎100％的攻击成功率，仅将分类准确性降低了不到0.7％-2.4％，并使最新的防御技术无效。

Typical deep neural network (DNN) backdoor attacks are based on triggers embedded in inputs. Existing imperceptible triggers are computationally expensive or low in attack success. In this paper, we propose a new backdoor trigger, which is easy to generate, imperceptible, and highly effective. The new trigger is a uniformly randomly generated three-dimensional (3D) binary pattern that can be horizontally and/or vertically repeated and mirrored and superposed onto three-channel images for training a backdoored DNN model. Dispersed throughout an image, the new trigger produces weak perturbation to individual pixels, but collectively holds a strong recognizable pattern to train and activate the backdoor of the DNN. We also analytically reveal that the trigger is increasingly effective with the improving resolution of the images. Experiments are conducted using the ResNet-18 and MLP models on the MNIST, CIFAR-10, and BTSR datasets. In terms of imperceptibility, the new trigger outperforms existing triggers, such as BadNets, Trojaned NN, and Hidden Backdoor, by over an order of magnitude. The new trigger achieves an almost 100% attack success rate, only reduces the classification accuracy by less than 0.7%-2.4%, and invalidates the state-of-the-art defense techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题