自适应 - 重力：防御对抗样本

论文标题

自适应 - 重力：防御对抗样本

Adaptive-Gravity: A Defense Against Adversarial Samples

论文作者

Mirzaeian, Ali, Tian, Zhi, D, Sai Manoj P, Latibari, Banafsheh S., Savidis, Ioannis, Homayoun, Houman, Sasan, Avesta

论文摘要

本文提出了一种新颖的模型训练解决方案，称为自适应 - 重力，以增强深神经网络分类器对对抗性例子的鲁棒性。我们将与每个类关联的模型参数/特征概念化为一个质量，其特征是其质心位置和质心周围特征的距离传播（标准偏差）。我们使用与每个集群相关的质心来得出一种反实力，在网络培训期间将不同类别的质心彼此推开。然后，我们定制了一个目标函数，旨在将每个班级的特征集中在其相应的新质心上，这是通过反重力力获得的。这种方法导致不同质量之间的分离较大，并减少了每个质心周围特征的传播。结果，将样品从可以映射到对抗性示例的空间中，有效地增加了制作对抗性示例所需的扰动程度。我们已经实施了这种训练解决方案作为一种迭代方法，该方法在每次迭代时由四个步骤组成：1）质心提取，2）反重力力量计算，3）质心迁移和4）重力训练。通过使用LENET和RESNET110网络来评估针对各种攻击模型（包括FGSM，MIM，BIM和PGD）的相应欺骗率来评估重力的效率，并针对MNIST和CIFAR10分类问题进行了基准测试。测试结果表明，重力不仅是一种强大的工具，可以强大的模型抵抗最先进的对抗攻击，而且还可以有效提高模型训练精度。

This paper presents a novel model training solution, denoted as Adaptive-Gravity, for enhancing the robustness of deep neural network classifiers against adversarial examples. We conceptualize the model parameters/features associated with each class as a mass characterized by its centroid location and the spread (standard deviation of the distance) of features around the centroid. We use the centroid associated with each cluster to derive an anti-gravity force that pushes the centroids of different classes away from one another during network training. Then we customized an objective function that aims to concentrate each class's features toward their corresponding new centroid, which has been obtained by anti-gravity force. This methodology results in a larger separation between different masses and reduces the spread of features around each centroid. As a result, the samples are pushed away from the space that adversarial examples could be mapped to, effectively increasing the degree of perturbation needed for making an adversarial example. We have implemented this training solution as an iterative method consisting of four steps at each iteration: 1) centroid extraction, 2) anti-gravity force calculation, 3) centroid relocation, and 4) gravity training. Gravity's efficiency is evaluated by measuring the corresponding fooling rates against various attack models, including FGSM, MIM, BIM, and PGD using LeNet and ResNet110 networks, benchmarked against MNIST and CIFAR10 classification problems. Test results show that Gravity not only functions as a powerful instrument to robustify a model against state-of-the-art adversarial attacks but also effectively improves the model training accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题