论文标题
通过分段线性近似来解释的神经网络模型
An interpretable neural network model through piecewise linear approximation
论文作者
论文摘要
大多数现有的可解释方法以事后方式解释了黑框模型,该模型使用更简单的模型或数据分析技术来解释模型后的预测。但是,在给定不同的方法和数据样本的情况下,它们(a)可能会在相同的预测上得出矛盾的解释,并且(b)专注于使用更简单的模型以牺牲预测准确性来提供更高的描述精度。为了解决这些问题,我们提出了一个结合了分段线性组件和非线性组件的混合解释模型。第一个组件通过分段线性近似来描述明确的特征贡献,以提高模型的表现力。另一个组件使用多层感知器来捕获特征相互作用和隐式非线性,并提高预测性能。与事后方法不同,一旦以特征形状的形式学习了模型,就可以获得可解释性。我们还提供了一种变体来探索特征之间的高阶相互作用,以证明所提出的模型具有适应性的灵活性。实验表明,所提出的模型可以通过描述特征形状,同时保持最先进的准确性来实现良好的解释性。
Most existing interpretable methods explain a black-box model in a post-hoc manner, which uses simpler models or data analysis techniques to interpret the predictions after the model is learned. However, they (a) may derive contradictory explanations on the same predictions given different methods and data samples, and (b) focus on using simpler models to provide higher descriptive accuracy at the sacrifice of prediction accuracy. To address these issues, we propose a hybrid interpretable model that combines a piecewise linear component and a nonlinear component. The first component describes the explicit feature contributions by piecewise linear approximation to increase the expressiveness of the model. The other component uses a multi-layer perceptron to capture feature interactions and implicit nonlinearity, and increase the prediction performance. Different from the post-hoc approaches, the interpretability is obtained once the model is learned in the form of feature shapes. We also provide a variant to explore higher-order interactions among features to demonstrate that the proposed model is flexible for adaptation. Experiments demonstrate that the proposed model can achieve good interpretability by describing feature shapes while maintaining state-of-the-art accuracy.