生成模型的I型攻击

论文标题

生成模型的I型攻击

Type I Attack for Generative Models

论文作者

Sun, Chengjin, Chen, Sizhe, Cai, Jia, Huang, Xiaolin

论文摘要

生成模型是具有广泛应用的流行工具。然而，它与分类器一样容易受到对抗样本的影响。现有的攻击方法主要集中于通过在输入中添加不可察觉的扰动来生成对抗性示例，从而导致错误的结果。但是，我们专注于攻击的另一个方面，即通过重大变化的作弊模型。前者诱导II型错误，后者导致I型错误。在本文中，我们提出了I型攻击对VAE和GAN等生成模型的攻击。 VAE中给出的一个示例是，我们可以将原始图像显着更改为毫无意义的图像，但是它们的重建结果相似。为了实现I型攻击，我们通过增加输入空间中的距离来破坏原始攻击，同时保持输出相似，因为不同的输入可能对应于深神经网络属性的类似功能。实验结果表明，我们的攻击方法有效地生成了大规模图像数据集上生成模型的I型对抗示例。

Generative models are popular tools with a wide range of applications. Nevertheless, it is as vulnerable to adversarial samples as classifiers. The existing attack methods mainly focus on generating adversarial examples by adding imperceptible perturbations to input, which leads to wrong result. However, we focus on another aspect of attack, i.e., cheating models by significant changes. The former induces Type II error and the latter causes Type I error. In this paper, we propose Type I attack to generative models such as VAE and GAN. One example given in VAE is that we can change an original image significantly to a meaningless one but their reconstruction results are similar. To implement the Type I attack, we destroy the original one by increasing the distance in input space while keeping the output similar because different inputs may correspond to similar features for the property of deep neural network. Experimental results show that our attack method is effective to generate Type I adversarial examples for generative models on large-scale image datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题