对高维数据的人解释模型解释性

论文标题

对高维数据的人解释模型解释性

Human-interpretable model explainability on high-dimensional data

论文作者

de Mijolla, Damien, Frye, Christopher, Kunesch, Markus, Mansir, John, Feige, Ilya

论文摘要

随着神经网络架构及其模型的数据越来越复杂，机器学习中解释性的重要性不断增长。当模型的输入特征变得高维时，就会出现独特的挑战：一方面，原则上的模型不合时宜的解释性方法变得太昂贵了；另一方面，更有效的解释性算法缺乏对普通用户的自然解释。在这项工作中，我们在高维数据上引入了一个可解释性的框架，该框架由两个模块组成。首先，我们采用语义上有意义的潜在表示，既可以降低数据的原始维度，又可以确保其人类的解释性。可以学习这些潜在特征，例如显式地作为删除表示形式或通过图像到图像翻译隐式，或者可以基于用户选择的任何可计算数量。其次，我们将Shapley范式调整为模型不合时宜的解释性，以在这些潜在特征上运行。这导致了可解释的模型解释，这些解释既是理论上控制的，又可以在计算上进行处理。我们基于合成数据的方法进行基准测试，并证明了其对几个图像分类任务的有效性。

The importance of explainability in machine learning continues to grow, as both neural-network architectures and the data they model become increasingly complex. Unique challenges arise when a model's input features become high dimensional: on one hand, principled model-agnostic approaches to explainability become too computationally expensive; on the other, more efficient explainability algorithms lack natural interpretations for general users. In this work, we introduce a framework for human-interpretable explainability on high-dimensional data, consisting of two modules. First, we apply a semantically meaningful latent representation, both to reduce the raw dimensionality of the data, and to ensure its human interpretability. These latent features can be learnt, e.g. explicitly as disentangled representations or implicitly through image-to-image translation, or they can be based on any computable quantities the user chooses. Second, we adapt the Shapley paradigm for model-agnostic explainability to operate on these latent features. This leads to interpretable model explanations that are both theoretically controlled and computationally tractable. We benchmark our approach on synthetic data and demonstrate its effectiveness on several image-classification tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题