论文标题
黑匣子通过学习潜在特征空间中的图像示例来解释
Black Box Explanation by Learning Image Exemplars in the Latent Feature Space
论文作者
论文摘要
我们提出了一种解释黑匣子模型进行图像分类的决策的方法。在使用黑匣子标记图像时,我们的解释方法利用了通过对抗自动编码器学到的潜在特征空间。提出的方法首先在潜在特征空间中生成示例图像,并学习决策树分类器。然后,它选择并解码符合本地决策规则的示例。最后,它以向用户显示如何修改示例以保持在他们的班级中或通过“变形”成另一个班级而成为反事实的方式来形象化它们。由于我们专注于用于图像分类的黑匣子决策系统,因此从示例中获得的解释还提供了一个显着图,突出了图像的区域,这些图像有助于其分类,以及将其推入另一个类的图像区域。我们在三个数据集和两个黑匣子模型上介绍了实验评估的结果。除了提供最有用和解释的解释外,我们还表明,所提出的方法在忠诚,相关性,连贯性和稳定性方面优于现有解释者。
We present an approach to explain the decisions of black box models for image classification. While using the black box to label images, our explanation method exploits the latent feature space learned through an adversarial autoencoder. The proposed method first generates exemplar images in the latent feature space and learns a decision tree classifier. Then, it selects and decodes exemplars respecting local decision rules. Finally, it visualizes them in a manner that shows to the user how the exemplars can be modified to either stay within their class, or to become counter-factuals by "morphing" into another class. Since we focus on black box decision systems for image classification, the explanation obtained from the exemplars also provides a saliency map highlighting the areas of the image that contribute to its classification, and areas of the image that push it into another class. We present the results of an experimental evaluation on three datasets and two black box models. Besides providing the most useful and interpretable explanations, we show that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and stability.