论文标题
评估当地解释的忠诚的框架
Framework for Evaluating Faithfulness of Local Explanations
论文作者
论文摘要
我们研究了基础预测模型的解释系统的忠诚。我们表明,这可以通过两种属性(一致性和充分性)来捕获,并引入定量衡量标准。有趣的是,这些度量取决于测试时间数据分布。对于各种现有的解释系统,例如锚点,我们可以分析研究这些数量。我们还提供估计器和样本复杂性界限,以确定黑盒解释系统的忠诚度。最后,我们通过实验验证了新的属性和估计器。
We study the faithfulness of an explanation system to the underlying prediction model. We show that this can be captured by two properties, consistency and sufficiency, and introduce quantitative measures of the extent to which these hold. Interestingly, these measures depend on the test-time data distribution. For a variety of existing explanation systems, such as anchors, we analytically study these quantities. We also provide estimators and sample complexity bounds for empirically determining the faithfulness of black-box explanation systems. Finally, we experimentally validate the new properties and estimators.