论文标题

局部差异私有数据的反应核密度估计和回归

Deconvoluting Kernel Density Estimation and Regression for Locally Differentially Private Data

论文作者

Farokhi, Farhad

论文摘要

当地的差异隐私已成为隐私文献的黄金标准,用于以隐私的方式收集或发布敏感的个体数据点。但是,由于用于确保隐私的添加噪声,局部差异数据可以扭曲数据的概率密度。实际上,与原始数据点的密度函数相比,由于卷积具有隐私噪声密度密度函数,因此具有隐私数据的密度(无论我们收集了多少样品)总是比较扁平的。当使用缓慢的隐私噪音(例如Laplace噪声)时,效果尤其明显。这可能会导致重击的不足/过度估计。由于在美国2020年的人口普查中使用差异隐私,这是社会科学家面临的重要挑战。在本文中,我们使用平滑核开发密度估计方法。我们使用将内核密度估计器进行反伏值的框架来消除隐私噪声的效果。这种方法还使我们能够使用非参数回归的结果来适应错误的错误,以基于本地私人数据开发回归模型。我们证明了已开发方法在财务和人口数据集上的性能。

Local differential privacy has become the gold-standard of privacy literature for gathering or releasing sensitive individual data points in a privacy-preserving manner. However, locally differential data can twist the probability density of the data because of the additive noise used to ensure privacy. In fact, the density of privacy-preserving data (no matter how many samples we gather) is always flatter in comparison with the density function of the original data points due to convolution with privacy-preserving noise density function. The effect is especially more pronounced when using slow-decaying privacy-preserving noises, such as the Laplace noise. This can result in under/over-estimation of the heavy-hitters. This is an important challenge facing social scientists due to the use of differential privacy in the 2020 Census in the United States. In this paper, we develop density estimation methods using smoothing kernels. We use the framework of deconvoluting kernel density estimators to remove the effect of privacy-preserving noise. This approach also allows us to adapt the results from non-parameteric regression with errors-in-variables to develop regression models based on locally differentially private data. We demonstrate the performance of the developed methods on financial and demographic datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源