论文标题
通过多个假设测试分析机器学习中的隐私泄漏:FANO的课程
Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis Testing: A Lesson From Fano
论文作者
论文摘要
差异隐私(DP)是迄今为止最广泛接受的框架,用于减轻机器学习中的隐私风险。但是,确切的隐私参数$ε$需要在实践中预防某些隐私风险需要有多大的理解。在这项工作中,我们研究了用于离散数据的数据重建攻击,并在多个假设检验的框架下进行分析。我们利用著名的Fano不平等的不同变体来推导数据重建对手的推理能力上的上限,当模型受到私人的不同训练时。重要的是,我们表明,如果基础私人数据从一组尺寸$ m $中获取值,那么目标隐私参数$ε$可以是$ o(\ log m)$,然后在对手获得明显的推论能力之前。我们的分析提供了理论上的证据,即即使在$ε$的相对较大的值下,DP对数据重建攻击的经验有效性也提供了证据。
Differential privacy (DP) is by far the most widely accepted framework for mitigating privacy risks in machine learning. However, exactly how small the privacy parameter $ε$ needs to be to protect against certain privacy risks in practice is still not well-understood. In this work, we study data reconstruction attacks for discrete data and analyze it under the framework of multiple hypothesis testing. We utilize different variants of the celebrated Fano's inequality to derive upper bounds on the inferential power of a data reconstruction adversary when the model is trained differentially privately. Importantly, we show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $ε$ can be $O(\log M)$ before the adversary gains significant inferential power. Our analysis offers theoretical evidence for the empirical effectiveness of DP against data reconstruction attacks even at relatively large values of $ε$.