Exsum：从本地解释到模型理解

论文标题

Exsum：从本地解释到模型理解

ExSum: From Local Explanations to Model Understanding

论文作者

Zhou, Yilun, Ribeiro, Marco Tulio, Shah, Julie

论文摘要

开发可解释性方法是为了了解黑盒模型的工作机制，这对于其负责任的部署至关重要。实现这一目标既要求这些方法产生的解释是正确的，并且人们可以轻松，可靠地理解它们。虽然前者在先前的工作中已经解决了，但后者经常被忽略，从而产生了来自少数本地解释的非正式模型理解。在本文中，我们介绍了解释摘要（EXSUM），这是一个用于量化模型理解的数学框架，并提出了指标的质量评估。在两个领域，Exum突出了当前实践中的各种局限性，有助于发展准确的模型理解，并揭示了模型的易于忽略的属性。我们还将可理解性与解释的其他特性联系起来，例如人类的一致性，稳健性和反事实的最低性和合理性。

Interpretability methods are developed to understand the working mechanisms of black-box models, which is crucial to their responsible deployment. Fulfilling this goal requires both that the explanations generated by these methods are correct and that people can easily and reliably understand them. While the former has been addressed in prior work, the latter is often overlooked, resulting in informal model understanding derived from a handful of local explanations. In this paper, we introduce explanation summary (ExSum), a mathematical framework for quantifying model understanding, and propose metrics for its quality assessment. On two domains, ExSum highlights various limitations in the current practice, helps develop accurate model understanding, and reveals easily overlooked properties of the model. We also connect understandability to other properties of explanations such as human alignment, robustness, and counterfactual minimality and plausibility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题