可解释的全局误差以特征重要性加权：XGEWFI度量标准评估数据插补和数据增强的误差

论文标题

可解释的全局误差以特征重要性加权：XGEWFI度量标准评估数据插补和数据增强的误差

Explainable Global Error Weighted on Feature Importance: The xGEWFI metric to evaluate the error of data imputation and data augmentation

论文作者

Dessureault, Jean-Sébastien, Massicotte, Daniel

论文摘要

评估算法的性能至关重要。评估数据插补和数据增强的性能可能相似，因为两个生成的数据都可以与原始分布进行比较。虽然，典型的评估指标具有相同的缺陷：它们计算了功能的误差和生成数据上的全局误差，而无需将误差加权以功能重要性加权。如果所有功能的重要性都相似，则结果可能是好的。但是，在大多数情况下，功能的重要性是不平衡的，并且可能引起对特征和全球错误的重要偏见。本文提出了一个名为“可解释的全局误差以特征重要性加权”的新颖指标（XGEWFI）。该新的度量标准以整个预处理方法进行测试，该方法可检测异常值并用无效的值代替它们。 2。算出数据丢失，3。增加数据。在过程结束时，计算XGEWFI错误。原始数据和生成数据之间的分布误差是使用每个功能的Kolmogorov-Smirnov测试（KS测试）计算的。这些结果乘以相应特征的重要性，该特征是使用随机森林（RF）算法计算得出的。公制结果以可解释的格式表示，旨在伦理AI。

Evaluating the performance of an algorithm is crucial. Evaluating the performance of data imputation and data augmentation can be similar since both generated data can be compared with an original distribution. Although, the typical evaluation metrics have the same flaw: They calculate the feature's error and the global error on the generated data without weighting the error with the feature importance. The result can be good if all of the feature's importance is similar. However, in most cases, the importance of the features is imbalanced, and it can induce an important bias on the features and global errors. This paper proposes a novel metric named "Explainable Global Error Weighted on Feature Importance"(xGEWFI). This new metric is tested in a whole preprocessing method that 1. detects the outliers and replaces them with a null value. 2. imputes the data missing, and 3. augments the data. At the end of the process, the xGEWFI error is calculated. The distribution error between the original and generated data is calculated using a Kolmogorov-Smirnov test (KS test) for each feature. Those results are multiplied by the importance of the respective features, calculated using a Random Forest (RF) algorithm. The metric result is expressed in an explainable format, aiming for an ethical AI.

下载PDF全文

下载文献需遵守相关版权规定

论文标题