论文标题
这种互动如何影响我?特征互动的可解释归因
How does this interaction affect me? Interpretable attribution for feature interactions
论文作者
论文摘要
机器学习透明度要求对输入与预测有何关系的可解释说明。特征归因是一种分析特征对预测的影响的方法。特征相互作用是共同影响预测的特征之间的上下文依赖性。在预测模型中,有许多方法可以提取特征相互作用。但是,将属性分配给相互作用的方法是无法解释的,特定于模型的或非轴心的。我们提出了一个称为群岛的交互归因和检测框架,该框架解决了这些问题,并且在现实世界中也可以扩展。我们在标准注释标签上的实验表明,我们的方法提供了比可比方法更明显的可解释解释,这对于分析相互作用对预测的影响很重要。我们还提供了伴随的方法的可视化方法,从而为深度神经网络提供了新的见解。
Machine learning transparency calls for interpretable explanations of how inputs relate to predictions. Feature attribution is a way to analyze the impact of features on predictions. Feature interactions are the contextual dependence between features that jointly impact predictions. There are a number of methods that extract feature interactions in prediction models; however, the methods that assign attributions to interactions are either uninterpretable, model-specific, or non-axiomatic. We propose an interaction attribution and detection framework called Archipelago which addresses these problems and is also scalable in real-world settings. Our experiments on standard annotation labels indicate our approach provides significantly more interpretable explanations than comparable methods, which is important for analyzing the impact of interactions on predictions. We also provide accompanying visualizations of our approach that give new insights into deep neural networks.