贝拉特雷克斯：通过当地准确的规则提取器建立解释

论文标题

贝拉特雷克斯：通过当地准确的规则提取器建立解释

BELLATREX: Building Explanations through a LocaLly AccuraTe Rule EXtractor

论文作者

Dedja, Klest, Nakano, Felipe Kenji, Pliakos, Konstantinos, Vens, Celine

论文摘要

诸如随机森林之类的树木构成算法是有效的机器学习方法，其灵活性，高性能和鲁棒性过于拟合。但是，由于将多个学习者组合在一起，因此他们不像单个决策树那样解释。在这项工作中，我们提出了一种新的方法，该方法正在通过本地准确的规则提取器（Bellatrex）来构建解释，并能够为只有几种不同规则的给定测试实例解释森林预测。从随机森林产生的决策树开始，我们的方法1）预选为进行预测的规则的一个子集，2）创建此类规则的矢量表示，3）将它们投射到低维空间，4）群集以此类表示，以从每个集群中选择一个规则来解释实例的预测。我们测试了Bellatrex对89个现实世界数据集的有效性，并演示了我们对二进制分类，回归，多标签分类和事件时间任务的方法的有效性。据我们所知，这是解释性工具箱第一次可以在同一框架内处理所有这些任务。我们还表明，我们提取的替代模型可以在所有被考虑的任务中近似相应的集合模型的性能，同时仅从整个森林中选择几棵树。我们还表明，我们提出的方法在预测性能方面大大优于其他可解释的方法。

Tree-ensemble algorithms, such as random forest, are effective machine learning methods popular for their flexibility, high performance, and robustness to overfitting. However, since multiple learners are combined, they are not as interpretable as a single decision tree. In this work we propose a novel method that is Building Explanations through a LocalLy AccuraTe Rule EXtractor (Bellatrex), and is able to explain the forest prediction for a given test instance with only a few diverse rules. Starting from the decision trees generated by a random forest, our method 1) pre-selects a subset of the rules used to make the prediction, 2) creates a vector representation of such rules, 3) projects them to a low-dimensional space, 4) clusters such representations to pick a rule from each cluster to explain the instance prediction. We test the effectiveness of Bellatrex on 89 real-world datasets and we demonstrate the validity of our method for binary classification, regression, multi-label classification and time-to-event tasks. To the best of our knowledge, it is the first time that an interpretability toolbox can handle all these tasks within the same framework. We also show that our extracted surrogate model can approximate the performance of the corresponding ensemble model in all considered tasks, while selecting only few trees from the whole forest. We also show that our proposed approach substantially outperforms other explainable methods in terms of predictive performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题