论文标题
掩盖对抗性损害:寻找坚固和稀疏网络的对抗性显着性
Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network
论文作者
论文摘要
对抗性的例子引起了深层神经网络中弱的可靠性和潜在的安全问题。尽管对对抗性训练进行了广泛的研究以改善对抗性鲁棒性,但它在过度参数化的制度中起作用,需要高度计算和大量的记忆预算。为了弥合对抗性的鲁棒性和模型压缩,我们提出了一种新型的对抗修剪法,掩盖了对抗性损害(MAD),该方法采用了对抗性损失的二阶信息。通过使用它,我们可以准确地估算模型参数的对抗性显着性,并确定可以修剪哪些参数而不会削弱对抗性鲁棒性。此外,我们揭示了初始层的模型参数对对抗性示例高度敏感,并表明压缩特征表示保留了目标对象的语义信息。通过在三个公共数据集上进行的大量实验,我们证明了MAD有效地预处了经过对抗训练的网络而不会失去对抗性鲁棒性,并且比以前的对抗性修剪方法表现出更好的性能。
Adversarial examples provoke weak reliability and potential security issues in deep neural networks. Although adversarial training has been widely studied to improve adversarial robustness, it works in an over-parameterized regime and requires high computations and large memory budgets. To bridge adversarial robustness and model compression, we propose a novel adversarial pruning method, Masking Adversarial Damage (MAD) that employs second-order information of adversarial loss. By using it, we can accurately estimate adversarial saliency for model parameters and determine which parameters can be pruned without weakening adversarial robustness. Furthermore, we reveal that model parameters of initial layer are highly sensitive to the adversarial examples and show that compressed feature representation retains semantic information for the target objects. Through extensive experiments on three public datasets, we demonstrate that MAD effectively prunes adversarially trained networks without loosing adversarial robustness and shows better performance than previous adversarial pruning methods.