论文标题

在$ \ ell_2 $正则化的情况下,梯度增强树的个性化和全局功能归因

Individualized and Global Feature Attributions for Gradient Boosted Trees in the Presence of $\ell_2$ Regularization

论文作者

Sun, Qingyao

论文摘要

虽然$ \ ell_2 $正则化广泛用于训练梯度增强树,但流行的个性化特征归因方法(例如萨巴斯和treeshap)忽略了训练过程。我们提出了预测分解归因(PredeComp),这是一种新型的个性化特征归因,用于梯度增强树时,他们接受了$ \ ell_2 $正则化训练。理论分析表明,样本中数据和标签之间的内部产物本质上是树的总增益,并且在特征是独立的情况下,它可以在人群中忠实地恢复添加剂。受Predecomp和Total Gain之间的连接的启发,我们还提出了Treeinner,这是一个由任何个性化特征归因与每个树的户外样本数据上的内在产品定义的依据的全局特征属性家族。模拟数据集和基因组芯片数据集上的数值实验表明,Treeinner具有最新的特征选择性能。代码再现实验可在https://github.com/nalzok/treeinner上获得。

While $\ell_2$ regularization is widely used in training gradient boosted trees, popular individualized feature attribution methods for trees such as Saabas and TreeSHAP overlook the training procedure. We propose Prediction Decomposition Attribution (PreDecomp), a novel individualized feature attribution for gradient boosted trees when they are trained with $\ell_2$ regularization. Theoretical analysis shows that the inner product between PreDecomp and labels on in-sample data is essentially the total gain of a tree, and that it can faithfully recover additive models in the population case when features are independent. Inspired by the connection between PreDecomp and total gain, we also propose TreeInner, a family of debiased global feature attributions defined in terms of the inner product between any individualized feature attribution and labels on out-sample data for each tree. Numerical experiments on a simulated dataset and a genomic ChIP dataset show that TreeInner has state-of-the-art feature selection performance. Code reproducing experiments is available at https://github.com/nalzok/TreeInner .

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源