基于概念的解释中被忽视的因素：数据集选择，概念可学习性和人类能力

论文标题

基于概念的解释中被忽视的因素：数据集选择，概念可学习性和人类能力

Overlooked factors in concept-based explanations: Dataset choice, concept learnability, and human capability

论文作者

Ramaswamy, Vikram V., Kim, Sunnie S. Y., Fong, Ruth, Russakovsky, Olga

论文摘要

基于概念的可解释性方法旨在使用一组预定义的语义概念来解释深度神经网络模型的预测。这些方法在新的“探针”数据集上评估了训练有素的模型，并将模型预测与该数据集中标记的视觉概念相关联。尽管他们很受欢迎，但他们的局限性并未被文献所阐明和阐明。在这项工作中，我们分析了基于概念的解释中的三个常见因素。首先，选择探针数据集对生成的解释有深远的影响。我们的分析表明，不同的探针数据集可能会导致非常不同的解释，并表明这些解释在探针数据集之外不可概括。其次，我们发现探针数据集中的概念通常比他们声称要解释的课程更不太明显，更难学习，这使解释的正确性提出了质疑。我们认为，仅在基于概念的解释中才能使用视觉上的显着概念。最后，尽管现有方法使用了数百甚至数千个概念，但我们的人类研究揭示了32个或更少的概念更严格的上限，除此之外，这些解释实际上不太有用。我们对基于概念的可解释性方法的未来发展和分析提出了建议。可以在\ url {https://github.com/princetonvisualai/overlookedfactors}找到我们的分析和用户界面的代码。

Concept-based interpretability methods aim to explain deep neural network model predictions using a predefined set of semantic concepts. These methods evaluate a trained model on a new, "probe" dataset and correlate model predictions with the visual concepts labeled in that dataset. Despite their popularity, they suffer from limitations that are not well-understood and articulated by the literature. In this work, we analyze three commonly overlooked factors in concept-based explanations. First, the choice of the probe dataset has a profound impact on the generated explanations. Our analysis reveals that different probe datasets may lead to very different explanations, and suggests that the explanations are not generalizable outside the probe dataset. Second, we find that concepts in the probe dataset are often less salient and harder to learn than the classes they claim to explain, calling into question the correctness of the explanations. We argue that only visually salient concepts should be used in concept-based explanations. Finally, while existing methods use hundreds or even thousands of concepts, our human studies reveal a much stricter upper bound of 32 concepts or less, beyond which the explanations are much less practically useful. We make suggestions for future development and analysis of concept-based interpretability methods. Code for our analysis and user interface can be found at \url{https://github.com/princetonvisualai/OverlookedFactors}

下载PDF全文

下载文献需遵守相关版权规定

论文标题