论文标题

在差异隐私下插补

Imputation under Differential Privacy

论文作者

Das, Soumojit, Drechsler, Jorg, Merrill, Keith, Merrill, Shawn

论文摘要

关于差异隐私的文献几乎总是假定要分析的数据已被充分观察到。在大多数实际应用中,这是一个不切实际的假设。解决此问题的一种流行策略是插补,其中丢失的值被观察到的数据替换为估计值。在本文中,我们评估了各种方法以差异性私人方式回答归纳数据集上的查询,并讨论了考虑沿着管道隐私的位置的权衡。我们表明,如果不考虑隐私而进行插补,那么某些查询的灵敏度可以随着不完整记录的数量线性增加。另一方面,对于一般的归纳策略,可以通过确保在插补阶段确保隐私来大大降低这些最坏情况。我们使用模拟数据集在许多插补方案(私有和非私有化)中证明这些结果,并检查它们对私人查询对数据的实用性的影响。

The literature on differential privacy almost invariably assumes that the data to be analyzed are fully observed. In most practical applications this is an unrealistic assumption. A popular strategy to address this problem is imputation, in which missing values are replaced by estimated values given the observed data. In this paper we evaluate various approaches to answering queries on an imputed dataset in a differentially private manner, as well as discuss trade-offs as to where along the pipeline privacy is considered. We show that if imputation is done without consideration to privacy, the sensitivity of certain queries can increase linearly with the number of incomplete records. On the other hand, for a general class of imputation strategies, these worst case scenarios can be greatly reduced by ensuring privacy already during the imputation stage. We use a simulated dataset to demonstrate these results across a number of imputation schemes (both private and non-private) and examine their impact on the utility of a private query on the data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源