论文标题
通过图形自动编码器生成现实的合成关系数据
Generating Realistic Synthetic Relational Data through Graph Variational Autoencoders
论文作者
论文摘要
综合数据生成最近已广泛关注,作为传统数据匿名化的更可靠的替代方法。涉及的方法最初是为图像合成而开发的。因此,它们应用于医疗保健,金融和其他行业的典型表格和关系数据集的应用。尽管大量研究已致力于生成现实的表格数据集,但合成关系数据库的研究仍处于起步阶段。在本文中,我们将变异自动编码器框架与图神经网络相结合,以生成逼真的合成关系数据库。然后,我们将获得的方法应用于计算实验中的两个公开数据库。结果表明,即使对于具有高级数据类型的大型数据集,实际数据库的结构也可以准确保存在结果的合成数据集中。
Synthetic data generation has recently gained widespread attention as a more reliable alternative to traditional data anonymization. The involved methods are originally developed for image synthesis. Hence, their application to the typically tabular and relational datasets from healthcare, finance and other industries is non-trivial. While substantial research has been devoted to the generation of realistic tabular datasets, the study of synthetic relational databases is still in its infancy. In this paper, we combine the variational autoencoder framework with graph neural networks to generate realistic synthetic relational databases. We then apply the obtained method to two publicly available databases in computational experiments. The results indicate that real databases' structures are accurately preserved in the resulting synthetic datasets, even for large datasets with advanced data types.