概念梯度：基于概念的解释没有线性假设

论文标题

概念梯度：基于概念的解释没有线性假设

Concept Gradient: Concept-based Interpretation Without Linear Assumption

论文作者

Bai, Andrew, Yeh, Chih-Kuan, Ravikumar, Pradeep, Lin, Neil Y. C., Hsieh, Cho-Jui

论文摘要

基于概念的黑框模型的解释通常更为直观，让人可以理解。基于概念的解释最广泛采用的方法是概念激活向量（CAV）。 CAV依赖于学习给定模型和概念的某些潜在表示之间的线性关系。线性可分离性通常是隐式假定的，但通常不正确。在这项工作中，我们从基于概念的解释和提出的概念梯度（CG）的原始意图开始，将基于概念的解释扩展到线性概念函数之外。我们表明，对于一般（潜在的非线性）概念，我们可以数学上评估如何影响模型预测的概念的小变化，从而导致将基于梯度的解释扩展到概念空间。我们从经验上证明，在玩具示例和现实世界数据集中，CG表现优于CAV。

Concept-based interpretations of black-box models are often more intuitive for humans to understand. The most widely adopted approach for concept-based interpretation is Concept Activation Vector (CAV). CAV relies on learning a linear relation between some latent representation of a given model and concepts. The linear separability is usually implicitly assumed but does not hold true in general. In this work, we started from the original intent of concept-based interpretation and proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions. We showed that for a general (potentially non-linear) concept, we can mathematically evaluate how a small change of concept affecting the model's prediction, which leads to an extension of gradient-based interpretation to the concept space. We demonstrated empirically that CG outperforms CAV in both toy examples and real world datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题