论文标题
DPART:差异私人自回旋表格,这是合成数据生成的一般框架
dpart: Differentially Private Autoregressive Tabular, a General Framework for Synthetic Data Generation
论文作者
论文摘要
我们提出了一个通用,灵活和可扩展的框架DPART,这是一个开源python库,用于私人合成数据生成。该方法的核心是自回旋建模 - 将联合数据分布分配到一系列较低维度的条件分布的序列,这些分布由各种方法(例如机器学习模型(逻辑/线性回归,决策树)等),简单直方图或自定义技术捕获。该图书馆的创建是为了作为快速且可访问的基线以及容纳广泛的用户,从综合数据生成的第一步到具有域专业知识的经验丰富的人,他们可以配置建模的不同方面并贡献新的方法/机制。 DPART的特定实例包括独立版,Privbayes的优化版本以及新提出的模型DP-synthpop。 代码:https://github.com/hazy/dpart
We propose a general, flexible, and scalable framework dpart, an open source Python library for differentially private synthetic data generation. Central to the approach is autoregressive modelling -- breaking the joint data distribution to a sequence of lower-dimensional conditional distributions, captured by various methods such as machine learning models (logistic/linear regression, decision trees, etc.), simple histogram counts, or custom techniques. The library has been created with a view to serve as a quick and accessible baseline as well as to accommodate a wide audience of users, from those making their first steps in synthetic data generation, to more experienced ones with domain expertise who can configure different aspects of the modelling and contribute new methods/mechanisms. Specific instances of dpart include Independent, an optimized version of PrivBayes, and a newly proposed model, dp-synthpop. Code: https://github.com/hazy/dpart