论文标题
深度学习中的对抗性鲁棒性:对脆弱神经元的攻击
Adversarial Robustness in Deep Learning: Attacks on Fragile Neurons
论文作者
论文摘要
我们使用第一卷积层的节点辍学来确定深度学习体系结构的脆弱和健壮的神经元。使用对抗性靶向算法,我们将这些神经元与网络上对抗性攻击的分布相关联。近来,神经网络的对抗性鲁棒性引起了人们的重大关注,并突出了深度学习网络的内在弱点,以防止精心构建的变形用于输入图像。在本文中,我们评估了在MNIST和CIFAR10数据集中训练的最先进的图像分类模型的鲁棒性,该模型针对快速梯度标志方法攻击,这是一种欺骗神经网络的简单而有效的方法。我们的方法确定了受到对抗性攻击的影响最大的网络的特定神经元。因此,我们建议通过压缩稳健神经元内的特征并按比例扩大脆弱的神经元来使脆弱的神经元对这些攻击更加强大。
We identify fragile and robust neurons of deep learning architectures using nodal dropouts of the first convolutional layer. Using an adversarial targeting algorithm, we correlate these neurons with the distribution of adversarial attacks on the network. Adversarial robustness of neural networks has gained significant attention in recent times and highlights intrinsic weaknesses of deep learning networks against carefully constructed distortion applied to input images. In this paper, we evaluate the robustness of state-of-the-art image classification models trained on the MNIST and CIFAR10 datasets against the fast gradient sign method attack, a simple yet effective method of deceiving neural networks. Our method identifies the specific neurons of a network that are most affected by the adversarial attack being applied. We, therefore, propose to make fragile neurons more robust against these attacks by compressing features within robust neurons and amplifying the fragile neurons proportionally.