DPD-FVAE：使用具有差异性解码器的联合变异自动编码器的合成数据生成

论文标题

DPD-FVAE：使用具有差异性解码器的联合变异自动编码器的合成数据生成

DPD-fVAE: Synthetic Data Generation Using Federated Variational Autoencoders With Differentially-Private Decoder

论文作者

Pfitzner, Bjarne, Arnrich, Bert

论文摘要

联合学习（FL）正在增加关注，以处理诸如医疗保健等领域常见的敏感，分布式数据集。最近的作品没有在这些数据集上直接培训分类模型，而是考虑了能够综合不受任何隐私限制保护的新数据集的培训数据生成器。因此，可以向任何人提供合成数据，这可以进一步评估机器学习架构和研究问题。作为额外的隐私保护层，可以将差异隐私引入培训过程中。我们提出了DPD-FVAE，这是一种带有差异私人解码器的联合变异自动编码器，以合成一个新的标签数据集，用于后续的机器学习任务。通过仅将解码器组件与FL同步，我们可以降低每个时期的隐私成本，从而启用更好的数据生成器。在我们对MNIST，时尚摄影者和Celeba的评估中，我们展示了DPD-FVAE的好处，并以Fréchet成立距离和在合成数据集中训练的分类器的准确性向相关工作报告了竞争性能。

Federated learning (FL) is getting increased attention for processing sensitive, distributed datasets common to domains such as healthcare. Instead of directly training classification models on these datasets, recent works have considered training data generators capable of synthesising a new dataset which is not protected by any privacy restrictions. Thus, the synthetic data can be made available to anyone, which enables further evaluation of machine learning architectures and research questions off-site. As an additional layer of privacy-preservation, differential privacy can be introduced into the training process. We propose DPD-fVAE, a federated Variational Autoencoder with Differentially-Private Decoder, to synthesise a new, labelled dataset for subsequent machine learning tasks. By synchronising only the decoder component with FL, we can reduce the privacy cost per epoch and thus enable better data generators. In our evaluation on MNIST, Fashion-MNIST and CelebA, we show the benefits of DPD-fVAE and report competitive performance to related work in terms of Fréchet Inception Distance and accuracy of classifiers trained on the synthesised dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题