验证对抗性例子的原因

论文标题

验证对抗性例子的原因

Verifying the Causes of Adversarial Examples

论文作者

Li, Honglin, Fan, Yifei, Ganz, Frieder, Yezzi, Anthony, Barnaghi, Payam

论文摘要

神经网络的鲁棒性受到对抗性示例的挑战，这些例子几乎包含对输入几乎无法察觉的扰动，这些示例误导了分类器以高信任的不正确输出。受到彻底检查高维图像空间的极端困难的限制，对解释和证明对抗性例子的原因的研究背后是对攻击和防御的研究。在本文中，我们介绍了对抗性实例的潜在原因的集合，并通过精心设计的受控实验来验证（或部分验证）它们。对抗性示例的主要原因包括模型线性，一s-sum约束和类别的几何形状。为了控制这些原因的效果，使用多种技术，例如$ L_2 $归一化，损失函数的替换，参考数据集的构建以及使用多层概念概率的概率神经网络（MLP-PNN）和密度估计（DE）的新型模型（DE）。我们的实验结果表明，几何因素往往是更直接的原因，统计因素会放大现象，尤其是用于分配高预测置信度。我们认为，本文将激发更多的研究，以严格研究对抗性例子的根本原因，这反过来又为设计更健壮的模型提供了有用的指导。

The robustness of neural networks is challenged by adversarial examples that contain almost imperceptible perturbations to inputs, which mislead a classifier to incorrect outputs in high confidence. Limited by the extreme difficulty in examining a high-dimensional image space thoroughly, research on explaining and justifying the causes of adversarial examples falls behind studies on attacks and defenses. In this paper, we present a collection of potential causes of adversarial examples and verify (or partially verify) them through carefully-designed controlled experiments. The major causes of adversarial examples include model linearity, one-sum constraint, and geometry of the categories. To control the effect of those causes, multiple techniques are applied such as $L_2$ normalization, replacement of loss functions, construction of reference datasets, and novel models using multi-layer perceptron probabilistic neural networks (MLP-PNN) and density estimation (DE). Our experiment results show that geometric factors tend to be more direct causes and statistical factors magnify the phenomenon, especially for assigning high prediction confidence. We believe this paper will inspire more studies to rigorously investigate the root causes of adversarial examples, which in turn provide useful guidance on designing more robust models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题