论文标题
在表格数据上迈向地面真相解释性
Towards Ground Truth Explainability on Tabular Data
论文作者
论文摘要
在数据科学中,使用合成数据进行方法开发,特征选择和功能工程有悠久的历史。我们目前对合成数据的兴趣来自最近在解释性方面的工作。当今的数据集通常更大,更复杂 - 需要较少的解释模型。在\ textit {post hoc}的解释性的设置中,没有解释的基础真相。受到最近在解释确实提供基础真理的图像分类器的工作的启发,我们为表格数据提出了类似的解决方案。使用Copulas,对数据集的所需统计属性的简洁规范,用户可以使用受控的数据集和实验围绕解释性建立直觉。当前功能在三种用例中得到了证明:一维逻辑回归,信息特征的相关性影响,冗余变量的相关性影响。
In data science, there is a long history of using synthetic data for method development, feature selection and feature engineering. Our current interest in synthetic data comes from recent work in explainability. Today's datasets are typically larger and more complex - requiring less interpretable models. In the setting of \textit{post hoc} explainability, there is no ground truth for explanations. Inspired by recent work in explaining image classifiers that does provide ground truth, we propose a similar solution for tabular data. Using copulas, a concise specification of the desired statistical properties of a dataset, users can build intuition around explainability using controlled data sets and experimentation. The current capabilities are demonstrated on three use cases: one dimensional logistic regression, impact of correlation from informative features, impact of correlation from redundant variables.