解释AI系统做出的数据驱动决定：反事实方法

论文标题

解释AI系统做出的数据驱动决定：反事实方法

Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach

论文作者

Fernández-Loría, Carlos, Provost, Foster, Han, Xintian

论文摘要

我们研究了解释基于模型的AI系统做出的决定的反事实解释。我们考虑的反事实方法将解释定义为系统的数据输入集，该数据输入驱动决定（即更改集合中的输入会改变决策），并且是不可否认的（即更改输入的任何子集不会改变决策）。我们（1）演示了如何使用该框架为一般，数据驱动的AI系统做出的决策提供解释，该系统可能将功能与任意数据类型和多个预测模型合并在一起，并且（2）提出了一种启发式程序，以根据上下文找到最有用的解释。然后，我们将违反的解释与方法对比，这些方法通过根据其重要性加权特征（例如Shap，Lime）来解释模型预测，并列出了两个基本原因，这些原因我们应该仔细考虑重要性权威解释是否非常适合解释系统决策。具体而言，我们表明（i）对模型预测具有很大重要性的特征可能不会影响相应的决定，并且（ii）重要的权重不足以传达是否以及特征如何影响决策。我们通过几个简洁的示例和三个详细的案例研究来证明这一点，这些案例研究将反事实方法与塑造进行了比较，以说明反事实解释在这些条件下更好地解释数据驱动的决策，而不是重要的权重。

We examine counterfactual explanations for explaining the decisions made by model-based AI systems. The counterfactual approach we consider defines an explanation as a set of the system's data inputs that causally drives the decision (i.e., changing the inputs in the set changes the decision) and is irreducible (i.e., changing any subset of the inputs does not change the decision). We (1) demonstrate how this framework may be used to provide explanations for decisions made by general, data-driven AI systems that may incorporate features with arbitrary data types and multiple predictive models, and (2) propose a heuristic procedure to find the most useful explanations depending on the context. We then contrast counterfactual explanations with methods that explain model predictions by weighting features according to their importance (e.g., SHAP, LIME) and present two fundamental reasons why we should carefully consider whether importance-weight explanations are well-suited to explain system decisions. Specifically, we show that (i) features that have a large importance weight for a model prediction may not affect the corresponding decision, and (ii) importance weights are insufficient to communicate whether and how features influence decisions. We demonstrate this with several concise examples and three detailed case studies that compare the counterfactual approach with SHAP to illustrate various conditions under which counterfactual explanations explain data-driven decisions better than importance weights.

下载PDF全文

下载文献需遵守相关版权规定

论文标题