论文标题
隐私综合教育数据生成
Privacy-Preserving Synthetic Educational Data Generation
论文作者
论文摘要
机构收集了大量的学习痕迹,但他们可能不会出于隐私问题披露它。合成数据生成为教育研究开辟了新的机会。在本文中,我们提出了一个可以保留参与者隐私的教育数据的生成模型,以及比较合成数据生成器的评估框架。我们展示了幼稚的假名如何导致重新识别威胁并建议保证隐私的技术。我们评估了现有大规模教育开放数据集的方法。
Institutions collect massive learning traces but they may not disclose it for privacy issues. Synthetic data generation opens new opportunities for research in education. In this paper we present a generative model for educational data that can preserve the privacy of participants, and an evaluation framework for comparing synthetic data generators. We show how naive pseudonymization can lead to re-identification threats and suggest techniques to guarantee privacy. We evaluate our method on existing massive educational open datasets.