论文标题
基于梯度的解释的歧管假设
The Manifold Hypothesis for Gradient-Based Explanations
论文作者
论文摘要
基于梯度的解释算法何时提供感知一致的解释?我们提出了一个标准:需要将功能归因与数据歧管的切线空间保持一致。为了提供这一假设的证据,我们介绍了一个基于变异自动编码器的框架,该框架允许估计和生成图像歧管。通过跨各种不同数据集的实验 - MNIST,EMNIST,CIFAR10,X射线肺炎和糖尿病性视网膜病变检测 - 我们证明,特征归因与数据的切线相一致,感知上的趋势就越倾向于。然后,我们表明,与原始梯度相比,通过流行的事后方法(例如集成梯度和SmootyGrad)提供的属性与数据歧管更加紧密。对抗训练还改善了模型梯度与数据歧管的一致性。结果,我们建议解释算法应积极努力将其解释与数据歧管保持一致。这是CVPR研讨会纸的扩展版。代码可从https://github.com/tml-tuebingen/explanations-manifold获得。
When do gradient-based explanation algorithms provide perceptually-aligned explanations? We propose a criterion: the feature attributions need to be aligned with the tangent space of the data manifold. To provide evidence for this hypothesis, we introduce a framework based on variational autoencoders that allows to estimate and generate image manifolds. Through experiments across a range of different datasets -- MNIST, EMNIST, CIFAR10, X-ray pneumonia and Diabetic Retinopathy detection -- we demonstrate that the more a feature attribution is aligned with the tangent space of the data, the more perceptually-aligned it tends to be. We then show that the attributions provided by popular post-hoc methods such as Integrated Gradients and SmoothGrad are more strongly aligned with the data manifold than the raw gradient. Adversarial training also improves the alignment of model gradients with the data manifold. As a consequence, we suggest that explanation algorithms should actively strive to align their explanations with the data manifold. This is an extended version of a CVPR Workshop paper. Code is available at https://github.com/tml-tuebingen/explanations-manifold.