关于会员推理攻击的困难

论文标题

关于会员推理攻击的困难

On the Difficulty of Membership Inference Attacks

论文作者

Rezaei, Shahbaz, Liu, Xin

论文摘要

最近的研究提出了会员资格推理（MI）对深层模型的攻击，目的是推断是否在训练过程中使用了样本。尽管它们显然取得了成功，但这些研究仅报告了积极阶级（成员类）的准确性，准确性和回忆。因此，这些攻击的表现尚未清楚地报告在负类（非成员类）上。在本文中，我们表明，报告的MI攻击性能通常是误导性的，因为它们遭受了尚未报告的高误报率或错误的警报率（FAR）。远方显示了攻击模型错误的频率，将非训练样本（非成员）作为培训（成员）。高远使MI攻击从根本上是不切实际的，这对于诸如会员推理等任务尤为重要，而实际上大多数样本属于负（非训练）类别。此外，我们表明，当前的MI攻击模型只能以平庸的精度识别错误分类的样本的成员，这仅构成很小的训练样本。我们分析了几个新功能，这些新功能尚未全面探索成会员推断，包括与决策边界和梯度规范的距离，并得出结论，在火车和非培训样本中，深层模型的响应大多相似。我们使用各种模型体系结构，包括Lenet，Alexnet，Resnet等，包括MNIST，CIFAR-10，CIFAR-100和Imagenet，包括MNIST，CIFAR-10，CIFAR-100和IMAGENET，进行多个实验。我们表明，即使在同一时间为攻击者，几个对抗者都无法实现高准确性，即使在同一时间内也无法实现高度的高精度。源代码可在https://github.com/shrezaei/mi-Attack上找到。

Recent studies propose membership inference (MI) attacks on deep models, where the goal is to infer if a sample has been used in the training process. Despite their apparent success, these studies only report accuracy, precision, and recall of the positive class (member class). Hence, the performance of these attacks have not been clearly reported on negative class (non-member class). In this paper, we show that the way the MI attack performance has been reported is often misleading because they suffer from high false positive rate or false alarm rate (FAR) that has not been reported. FAR shows how often the attack model mislabel non-training samples (non-member) as training (member) ones. The high FAR makes MI attacks fundamentally impractical, which is particularly more significant for tasks such as membership inference where the majority of samples in reality belong to the negative (non-training) class. Moreover, we show that the current MI attack models can only identify the membership of misclassified samples with mediocre accuracy at best, which only constitute a very small portion of training samples. We analyze several new features that have not been comprehensively explored for membership inference before, including distance to the decision boundary and gradient norms, and conclude that deep models' responses are mostly similar among train and non-train samples. We conduct several experiments on image classification tasks, including MNIST, CIFAR-10, CIFAR-100, and ImageNet, using various model architecture, including LeNet, AlexNet, ResNet, etc. We show that the current state-of-the-art MI attacks cannot achieve high accuracy and low FAR at the same time, even when the attacker is given several advantages. The source code is available at https://github.com/shrezaei/MI-Attack.

下载PDF全文

下载文献需遵守相关版权规定

论文标题