论文标题
对抗图像中的空间相关模式
Spatially Correlated Patterns in Adversarial Images
论文作者
论文摘要
事实证明,对抗性攻击是对可靠机器学习解决方案的研究进展的主要障碍。精心制作的扰动是可以将人类视觉觉得的侵蚀性添加到图像中,以通过原本高表现的神经网络迫使错误分类。为了更好地了解这种结构化攻击的关键因素,我们在输入空间中像素分布中搜索并研究了空间共同关联的模式。在本文中,我们提出了一个在输入图像中隔离和隔离区域的框架,该框架对于分类(在推论)或对抗性脆弱性或两者兼而有之特别重要。我们断言,在推断期间,训练有素的模型查看图像中的特定区域,我们称之为重要性区域(ROI);攻击者着眼于改变/修改的区域,我们称之为攻击区域(ROA)。正如我们的观察所说明的那样,这种方法的成功也可以用于设计事后的对抗防御方法。这使用了阻止(我们称为中和)图像区域的概念,该区域极易受到对抗攻击的影响,但对于分类任务并不重要。我们建立了理论设置,用于正式化隔离,隔离和中和过程,并通过对标准基准数据集进行经验分析来证实它。这些发现强烈表明,在输入空间中映射特征可以保留在功能空间中通常观察到的重要模式,同时添加了主要的解释性,因此简化了潜在的防御机制。
Adversarial attacks have proved to be the major impediment in the progress on research towards reliable machine learning solutions. Carefully crafted perturbations, imperceptible to human vision, can be added to images to force misclassification by an otherwise high performing neural network. To have a better understanding of the key contributors of such structured attacks, we searched for and studied spatially co-located patterns in the distribution of pixels in the input space. In this paper, we propose a framework for segregating and isolating regions within an input image which are particularly critical towards either classification (during inference), or adversarial vulnerability or both. We assert that during inference, the trained model looks at a specific region in the image, which we call Region of Importance (RoI); and the attacker looks at a region to alter/modify, which we call Region of Attack (RoA). The success of this approach could also be used to design a post-hoc adversarial defence method, as illustrated by our observations. This uses the notion of blocking out (we call neutralizing) that region of the image which is highly vulnerable to adversarial attacks but is not important for the task of classification. We establish the theoretical setup for formalising the process of segregation, isolation and neutralization and substantiate it through empirical analysis on standard benchmarking datasets. The findings strongly indicate that mapping features into the input space preserves the significant patterns typically observed in the feature-space while adding major interpretability and therefore simplifies potential defensive mechanisms.