论文标题
IMDPGAN:使用生成对抗网络生成私人和特定数据
imdpGAN: Generating Private and Specific Data with Generative Adversarial Networks
论文作者
论文摘要
生成对抗网络(GAN)及其变体在生成合成数据方面表现出了令人鼓舞的结果。但是,gan的问题是:(i)学习发生在培训样本周围,模型通常最终会记住它们,因此损害了各个样本的隐私 - 当将gans应用于培训数据(包括个人身份信息)时,这将成为一个主要问题,(ii)生成数据中的随机性 - 对生成样品的特定性没有控制。为了解决这些问题,我们提出了IMDPGAN - 最大化私人生成对抗网络的信息。这是一个端到端框架,同时实现隐私保护并学习潜在的表示。通过在MNIST数据集上进行实验,我们表明IMDPGAN保留了各个数据点的隐私,并学习了潜在代码以控制生成的样品的特异性。我们以数字对执行二进制分类,以显示公用事业与隐私权衡权衡。随着我们在框架中提高隐私水平,分类精度降低。我们还通过实验表明,IMDPGAN的训练过程稳定,但与其他GAN框架相比,时间增加了10倍。最后,我们将IMDPGAN框架扩展到Celeba数据集,以显示如何使用隐私和学习的表示形式来控制输出的特异性。
Generative Adversarial Network (GAN) and its variants have shown promising results in generating synthetic data. However, the issues with GANs are: (i) the learning happens around the training samples and the model often ends up remembering them, consequently, compromising the privacy of individual samples - this becomes a major concern when GANs are applied to training data including personally identifiable information, (ii) the randomness in generated data - there is no control over the specificity of generated samples. To address these issues, we propose imdpGAN - an information maximizing differentially private Generative Adversarial Network. It is an end-to-end framework that simultaneously achieves privacy protection and learns latent representations. With experiments on MNIST dataset, we show that imdpGAN preserves the privacy of the individual data point, and learns latent codes to control the specificity of the generated samples. We perform binary classification on digit pairs to show the utility versus privacy trade-off. The classification accuracy decreases as we increase privacy levels in the framework. We also experimentally show that the training process of imdpGAN is stable but experience a 10-fold time increase as compared with other GAN frameworks. Finally, we extend imdpGAN framework to CelebA dataset to show how the privacy and learned representations can be used to control the specificity of the output.