论文标题
私人设定生成具有歧视性信息
Private Set Generation with Discriminative Information
论文作者
论文摘要
差异化的私人数据生成技术已成为对数据隐私挑战的有前途的解决方案 - 它可以在遵守严格的隐私保证的同时共享数据,这对于敏感领域的科学进步至关重要。不幸的是,受到建模高维分布的固有复杂性的限制,现有的私人生成模型正在与合成样本的实用性挣扎。 与旨在适合完整数据分配的现有作品相反,我们直接优化了一小组样本,这些样本代表了下游任务的判别信息监督下的分布,这通常是一个更容易的任务,更适合私人培训。我们的工作为高维数据差异化的私人生成提供了替代视图,并引入了一种简单而有效的方法,可大大改善最先进方法的样本效用。
Differentially private data generation techniques have become a promising solution to the data privacy challenge -- it enables sharing of data while complying with rigorous privacy guarantees, which is essential for scientific progress in sensitive domains. Unfortunately, restricted by the inherent complexity of modeling high-dimensional distributions, existing private generative models are struggling with the utility of synthetic samples. In contrast to existing works that aim at fitting the complete data distribution, we directly optimize for a small set of samples that are representative of the distribution under the supervision of discriminative information from downstream tasks, which is generally an easier task and more suitable for private training. Our work provides an alternative view for differentially private generation of high-dimensional data and introduces a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.