关于包装的内在差异隐私

论文标题

关于包装的内在差异隐私

On the Intrinsic Differential Privacy of Bagging

论文作者

Liu, Hongbin, Jia, Jinyuan, Gong, Neil Zhenqiang

论文摘要

私人机器学习训练模型，同时保护敏感培训数据的隐私。获得差异化模型的关键是将噪声/随机性引入训练过程。特别是，现有的不同私有机器学习方法为训练数据，梯度，损失功能和/或模型本身增加了噪声。 Bagging是一个受欢迎的合奏学习框架，随机创建了一些培训数据的子样本，使用基础学习者为每个子样本训练一个基本模型，并在做出预测时在基本模型中进行多数投票。袋装在训练过程中具有随机创建子样本时具有内在的随机性。我们的主要理论结果表明，这种固有的随机性已经使行李差异化，而无需额外的噪音。特别是，我们证明，对于任何基础学习者来说，分别有和没有替换的行李都可以实现$ \ left（n \ cdot k \ cdot \ cdot \ ln {\ frac {n+1} {n} {n}}}，1-（\ frac {n-1} {n} {n} {n} {n} { $ \ left（\ ln {\ frac {n+1} {n+1-n \ cdot k}}，\ frac {n \ cdot k} {n} {n} \ right）$ - 差异隐私，其中$ n $是训练数据大小，$ k $是subsample sigame sigame sizame sizam sizam sizam sizam sizam sizam sizam sizam sizam sibles base base Models。此外，我们证明，如果没有对基础学习者的假设，我们的派生隐私保证就会紧张。我们在经验上评估了MNIST和CIFAR10的行李。我们的实验结果表明，与具有相同隐私预算的最先进的私人机器学习方法相比，装袋的精度明显更高。

Differentially private machine learning trains models while protecting privacy of the sensitive training data. The key to obtain differentially private models is to introduce noise/randomness to the training process. In particular, existing differentially private machine learning methods add noise to the training data, the gradients, the loss function, and/or the model itself. Bagging, a popular ensemble learning framework, randomly creates some subsamples of the training data, trains a base model for each subsample using a base learner, and takes majority vote among the base models when making predictions. Bagging has intrinsic randomness in the training process as it randomly creates subsamples. Our major theoretical results show that such intrinsic randomness already makes Bagging differentially private without the needs of additional noise. In particular, we prove that, for any base learner, Bagging with and without replacement respectively achieves $\left(N\cdot k \cdot \ln{\frac{n+1}{n}},1- (\frac{n-1}{n})^{N\cdot k}\right)$-differential privacy and $\left(\ln{\frac{n+1}{n+1-N\cdot k}}, \frac{N\cdot k}{n} \right)$-differential privacy, where $n$ is the training data size, $k$ is the subsample size, and $N$ is the number of base models. Moreover, we prove that if no assumptions about the base learner are made, our derived privacy guarantees are tight. We empirically evaluate Bagging on MNIST and CIFAR10. Our experimental results demonstrate that Bagging achieves significantly higher accuracies than state-of-the-art differentially private machine learning methods with the same privacy budgets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题