认知奇偶校验：可重复性作为差异隐私的评估指标

论文标题

认知奇偶校验：可重复性作为差异隐私的评估指标

Epistemic Parity: Reproducibility as an Evaluation Metric for Differential Privacy

论文作者

Rosenblatt, Lucas, Herman, Bernease, Holovenko, Anastasia, Lee, Wonkwon, Loftus, Joshua, McKinnie, Elizabeth, Rumezhak, Taras, Stadnik, Andrii, Howe, Bill, Stoyanovich, Julia

论文摘要

差异隐私（DP）数据合成器支持公众发布敏感信息，为隐私提供了理论保证，但在实际环境中提供了效用的证据有限。实用程序通常被衡量为代表性代理任务的错误，例如描述性统计，训练有素的分类器的准确性或在查询工作负载上的性能。在包括美国人口普查在内的许多环境中，对这些结果概括为从业者的经验的能力受到了质疑。在本文中，我们提出了一种综合数据的评估方法，该方法避免了关于代理任务的代表性的假设，而是衡量如果作者使用合成数据的可能性会改变的可能性，我们称之为认识论的情况。我们的方法包括复制有关真实，公开数据的同行评审论文的经验结论，然后第二次在DP合成数据上重新运行这些实验，并比较结果。我们通过最近经过同行评审的论文的基准来实例化方法，该论文分析了ICPSR存储库中的公共数据集。我们对定量主张进行计算对实验工作流程的自动化进行建模，并通过重现可视化并手动比较结果来对定性主张进行建模。然后，我们使用多种最先进的机制生成DP合成数据集，并估计这些结论的可能性。我们发现，最先进的DP合成器能够在基准中的几篇论文中实现高认知奇偶校验。但是，某些论文，尤其是一些具体发现，对于任何合成器都很难复制。我们倡导一种新的机制，这些机制有利于更强大的公用事业保证并提供隐私保护，以专注于特定于应用的威胁模型和风险评估。

Differential privacy (DP) data synthesizers support public release of sensitive information, offering theoretical guarantees for privacy but limited evidence of utility in practical settings. Utility is typically measured as the error on representative proxy tasks, such as descriptive statistics, accuracy of trained classifiers, or performance over a query workload. The ability for these results to generalize to practitioners' experience has been questioned in a number of settings, including the U.S. Census. In this paper, we propose an evaluation methodology for synthetic data that avoids assumptions about the representativeness of proxy tasks, instead measuring the likelihood that published conclusions would change had the authors used synthetic data, a condition we call epistemic parity. Our methodology consists of reproducing empirical conclusions of peer-reviewed papers on real, publicly available data, then re-running these experiments a second time on DP synthetic data, and comparing the results. We instantiate our methodology over a benchmark of recent peer-reviewed papers that analyze public datasets in the ICPSR repository. We model quantitative claims computationally to automate the experimental workflow, and model qualitative claims by reproducing visualizations and comparing the results manually. We then generate DP synthetic datasets using multiple state-of-the-art mechanisms, and estimate the likelihood that these conclusions will hold. We find that state-of-the-art DP synthesizers are able to achieve high epistemic parity for several papers in our benchmark. However, some papers, and particularly some specific findings, are difficult to reproduce for any of the synthesizers. We advocate for a new class of mechanisms that favor stronger utility guarantees and offer privacy protection with a focus on application-specific threat models and risk-assessment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题