论文标题
在显着图和对抗性鲁棒性上
On Saliency Maps and Adversarial Robustness
论文作者
论文摘要
最近的一个趋势已经出现了,融入了可解释性和对抗性鲁棒性的概念,这与早期的努力不同,这些努力仅着眼于良好的解释或对对手的稳健性。作品表明,经过对抗训练的模型比其非持续的模型表现出更明显的显着性图,并且可以通过考虑输入图像和显着图之间的对齐方式来量化这种行为。在这项工作中,我们为这种耦合提供了不同的观点,并提供了一种基于显着性的对抗训练(SAT),以使用显着图来提高模型的对抗性鲁棒性。特别是,我们表明,使用诸如界框和分割掩模之类的注释,已经提供了一个数据集,作为弱显着图,足以提高对抗性鲁棒性,而没有额外的努力来产生扰动本身。我们对CIFAR-10,CIFAR-100,Tiny Imagenet和Flower-17数据集的经验结果始终通过使用我们的方法显示改善的对抗性鲁棒性来证实我们的主张。显着图。我们还展示了使用较优质和更强的显着图如何导致更健壮的模型,以及如何将SAT与现有的对抗训练方法集成,从而进一步促进这些现有方法的性能。
A Very recent trend has emerged to couple the notion of interpretability and adversarial robustness, unlike earlier efforts which solely focused on good interpretations or robustness against adversaries. Works have shown that adversarially trained models exhibit more interpretable saliency maps than their non-robust counterparts, and that this behavior can be quantified by considering the alignment between input image and saliency map. In this work, we provide a different perspective to this coupling, and provide a method, Saliency based Adversarial training (SAT), to use saliency maps to improve adversarial robustness of a model. In particular, we show that using annotations such as bounding boxes and segmentation masks, already provided with a dataset, as weak saliency maps, suffices to improve adversarial robustness with no additional effort to generate the perturbations themselves. Our empirical results on CIFAR-10, CIFAR-100, Tiny ImageNet and Flower-17 datasets consistently corroborate our claim, by showing improved adversarial robustness using our method. saliency maps. We also show how using finer and stronger saliency maps leads to more robust models, and how integrating SAT with existing adversarial training methods, further boosts performance of these existing methods.