论文标题

DP2-PUB:具有差异私人高维数据出版物,随机不变

DP2-Pub: Differentially Private High-Dimensional Data Publication with Invariant Post Randomization

论文作者

Jiang, Honglu, Yu, Haotian, Cheng, Xiuzhen, Pei, Jian, Pless, Robert, Yu, Jiguo

论文摘要

实际应用中出现了大量的高维和异质数据,这些数据通常被发布给第三方以进行数据分析,建议,有针对性的广告和可靠的预测。但是,发布这些数据可能会披露个人敏感信息,从而越来越关注侵犯隐私行为。近年来,保护隐私数据发布已受到广泛关注。不幸的是,高维数据的差异私人出版仍然是一个具有挑战性的问题。在本文中,我们提出了一个差异私有的高维数据出版机制(DP2-PUB),该机制分为两个阶段:基于马尔可夫 - 基于基于马尔可夫的属性聚类阶段和一个不变后随机化(PRAM)阶段。具体而言,将属性分解为具有高群集内凝聚力和较低集群间耦合的几个低维簇有助于获得合理的隐私预算分配,而满足当地差异隐私的双重扰动机制则促进了不变的PRAM,从而确保了统计信息的损失,从而无效地保留了统计信息。我们还使用满足当地差异隐私的半honest服务器将DP2-PUB机制扩展到场景。我们对四个现实世界数据集进行了广泛的实验,实验结果表明,我们的机制可以显着改善已发布数据的数据实用性,同时满足差异隐私。

A large amount of high-dimensional and heterogeneous data appear in practical applications, which are often published to third parties for data analysis, recommendations, targeted advertising, and reliable predictions. However, publishing these data may disclose personal sensitive information, resulting in an increasing concern on privacy violations. Privacy-preserving data publishing has received considerable attention in recent years. Unfortunately, the differentially private publication of high dimensional data remains a challenging problem. In this paper, we propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases: a Markov-blanket-based attribute clustering phase and an invariant post randomization (PRAM) phase. Specifically, splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable allocation of privacy budget, while a double-perturbation mechanism satisfying local differential privacy facilitates an invariant PRAM to ensure no loss of statistical information and thus significantly preserves data utility. We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy. We conduct extensive experiments on four real-world datasets and the experimental results demonstrate that our mechanism can significantly improve the data utility of the published data while satisfying differential privacy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源