论文标题

从分布式数据中保护合成数据生成的多方计算

Secure Multiparty Computation for Synthetic Data Generation from Distributed Data

论文作者

Pereira, Mayana, Pentyala, Sikha, Nascimento, Anderson, Sousa Jr., Rafael T. de, De Cock, Martine

论文摘要

访问相关数据的法律和道德限制会抑制健康,金融和教育等关键领域中的数据科学研究。具有隐私保证的综合数据生成算法正在作为打破此数据logjam的范式出现。但是,现有的方法假设数据持有人将其原始数据提供给值得信赖的策展人,后者将其用作合成数据生成的燃料。这严重限制了适用性,因为世界上许多有价值的数据都锁定在孤岛中,该实体由无法彼此显示其数据或中央聚合器的实体控制,而不会引起隐私问题。 为了克服这一障碍,我们提出了第一个解决方案,其中数据持有人仅共享加密的数据,以差异私有合成数据生成。数据持有人将共享发送给执行安全多方计算(MPC)计算的服务器,而原始数据保持加密。 我们将此想法实例化,以使用指数机制(MWEM)算法的乘法权重的MPC协议实例化,以基于实际数据源自许多数据持有人而不依赖单个失败点的真实数据生成合成数据。

Legal and ethical restrictions on accessing relevant data inhibit data science research in critical domains such as health, finance, and education. Synthetic data generation algorithms with privacy guarantees are emerging as a paradigm to break this data logjam. Existing approaches, however, assume that the data holders supply their raw data to a trusted curator, who uses it as fuel for synthetic data generation. This severely limits the applicability, as much of the valuable data in the world is locked up in silos, controlled by entities who cannot show their data to each other or a central aggregator without raising privacy concerns. To overcome this roadblock, we propose the first solution in which data holders only share encrypted data for differentially private synthetic data generation. Data holders send shares to servers who perform Secure Multiparty Computation (MPC) computations while the original data stays encrypted. We instantiate this idea in an MPC protocol for the Multiplicative Weights with Exponential Mechanism (MWEM) algorithm to generate synthetic data based on real data originating from many data holders without reliance on a single point of failure.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源