基于通用线坐标的决策树的可视化，以支持可解释的模型

论文标题

基于通用线坐标的决策树的可视化，以支持可解释的模型

Visualization of Decision Trees based on General Line Coordinates to Support Explainable Models

论文作者

Worland, Alex, Wagle, Sridevi, Kovalerchuk, Boris

论文摘要

机器学习（ML）模型的可视化是ML过程的重要组成部分，以增强ML模型的解释性和预测准确性。本文提出了一种新方法SPC-DT，以将决策树（DT）视为可解释的模型。这些方法使用称为移位配对坐标（SPC）的通用线坐标版本。在SPC中，每个N-D点都以一组二-D笛卡尔坐标的对形式可视化，作为有向图。新方法扩展并补充了现有方法的功能，以可视化DT模型。它显示：（1）属性之间的关系，（2）单个情况相对于DT结构，（3）DT中的数据流，（4）每次分开对DT节点中的阈值的紧密程度，以及（5）N-D空间部分中的病例密度。该信息对于评估和改进DT模型的领域专家很重要，包括避免过度笼过和过度拟合模型以及其性能。在案例研究中，使用三个真实数据集证明了这些方法的好处。

Visualization of Machine Learning (ML) models is an important part of the ML process to enhance the interpretability and prediction accuracy of the ML models. This paper proposes a new method SPC-DT to visualize the Decision Tree (DT) as interpretable models. These methods use a version of General Line Coordinates called Shifted Paired Coordinates (SPC). In SPC, each n-D point is visualized in a set of shifted pairs of 2-D Cartesian coordinates as a directed graph. The new method expands and complements the capabilities of existing methods, to visualize DT models. It shows: (1) relations between attributes, (2) individual cases relative to the DT structure, (3) data flow in the DT, (4) how tight each split is to thresholds in the DT nodes, and (5) the density of cases in parts of the n-D space. This information is important for domain experts for evaluating and improving the DT models, including avoiding overgeneralization and overfitting of models, along with their performance. The benefits of the methods are demonstrated in the case studies, using three real datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题