论文标题
属性隐私:框架和机制
Attribute Privacy: Framework and Mechanisms
论文作者
论文摘要
由于许多机器学习模型经过机密和潜在敏感数据的培训,因此确保培训数据的隐私是一个日益关注的问题。在大型数据集的分析过程中,已经非常关注用于保护个人隐私的方法。但是,在许多情况下,数据集的全球特性也可能是敏感的(例如,医院的死亡率,而不是数据集中的特定患者的存在)。在这项工作中,我们脱离了个人隐私来启动属性隐私的研究,其中数据所有者关心在分析过程中揭示整个数据集的敏感属性。我们建议在两个相关情况下捕获\ emph {属性隐私}的定义,在两个相关情况下,可能需要保护全局属性:(1)特定数据集的属性以及(2)对数据集采样数据集的参数。我们还提供了两种有效的机制和一种低效的机制,可满足这些设置的属性隐私。我们的结果基于新的pufferfish框架的新颖使用来说明数据中属性之间的相关性,从而解决了“开发Pufferfish实例化和算法的一般聚集秘密的挑战性问题”,这些问题由\ cite {kifer2014pufferfish}留下。
Ensuring the privacy of training data is a growing concern since many machine learning models are trained on confidential and potentially sensitive data. Much attention has been devoted to methods for protecting individual privacy during analyses of large datasets. However in many settings, global properties of the dataset may also be sensitive (e.g., mortality rate in a hospital rather than presence of a particular patient in the dataset). In this work, we depart from individual privacy to initiate the study of attribute privacy, where a data owner is concerned about revealing sensitive properties of a whole dataset during analysis. We propose definitions to capture \emph{attribute privacy} in two relevant cases where global attributes may need to be protected: (1) properties of a specific dataset and (2) parameters of the underlying distribution from which dataset is sampled. We also provide two efficient mechanisms and one inefficient mechanism that satisfy attribute privacy for these settings. We base our results on a novel use of the Pufferfish framework to account for correlations across attributes in the data, thus addressing "the challenging problem of developing Pufferfish instantiations and algorithms for general aggregate secrets" that was left open by \cite{kifer2014pufferfish}.