论文标题
对随机森林的无偏见重要性
Unbiased variable importance for random forests
论文作者
论文摘要
在随机森林中的默认可变物质措施,Gini的重要性已被证明遭受了基本的Gini Gini分裂标准的偏见。虽然替代置换重要性通常被认为是可变重要性的可靠度量,但它在计算上的要求也很高,并且遭受了其他缺点。我们提出了一种简单的解决方案,以误导/不信任的Gini重要性,这可以看作是一个过度拟合的问题:我们计算出袋外的损失减少,而不是击带训练样本。
The default variable-importance measure in random Forests, Gini importance, has been shown to suffer from the bias of the underlying Gini-gain splitting criterion. While the alternative permutation importance is generally accepted as a reliable measure of variable importance, it is also computationally demanding and suffers from other shortcomings. We propose a simple solution to the misleading/untrustworthy Gini importance which can be viewed as an overfitting problem: we compute the loss reduction on the out-of-bag instead of the in-bag training samples.