论文标题
对非常大的数据集的灵活且可扩展的隐私评估,并向官方政府微型数据申请
Flexible and scalable privacy assessment for very large datasets, with an application to official governmental microdata
论文作者
论文摘要
我们提出了对隐私分析的常规处理的系统重构,并基于定量信息流(QIF)框架的数学概念。我们建议的方法带来了三个主要优势:它是灵活的,可以精确量化和比较已知和新颖的攻击风险;对于非常大的纵向数据集,它可以在计算上进行计算。它的结果既可以对政客和公众进行解释。我们将我们的方法应用于一个非常大的案例研究:由政府机构朝北策划的巴西的教育普查,该介绍所包含90多个自2007年以来每年纵向纵向释放的人,这些属性是自2007年以来纵向释放的。这些数据集仅在最近(2018年至2021年)才能在同一社会上进行宣布,以使其保持开放态度。 Inep对这项立法的反应是我们项目的起源。在我们的结论中,我们分享了我们在此过程中学到的科学,技术和沟通课程。
We present a systematic refactoring of the conventional treatment of privacy analyses, basing it on mathematical concepts from the framework of Quantitative Information Flow (QIF). The approach we suggest brings three principal advantages: it is flexible, allowing for precise quantification and comparison of privacy risks for attacks both known and novel; it can be computationally tractable for very large, longitudinal datasets; and its results are explainable both to politicians and to the general public. We apply our approach to a very large case study: the Educational Censuses of Brazil, curated by the governmental agency INEP, which comprise over 90 attributes of approximately 50 million individuals released longitudinally every year since 2007. These datasets have only very recently (2018-2021) attracted legislation to regulate their privacy -- while at the same time continuing to maintain the openness that had been sought in Brazilian society. INEP's reaction to that legislation was the genesis of our project with them. In our conclusions here we share the scientific, technical, and communication lessons we learned in the process.