论文标题

(决策和回归)基于树合奏的回归和分类内核

(Decision and regression) tree ensemble based kernels for regression and classification

论文作者

Feng, Dai, Baumgartner, Richard

论文摘要

基于树木的合奏,例如Br​​eiman的随机森林(RF)和梯度增强的树(GBT)可以解释为隐式内核发生器,随后随之而来的接近矩阵代表数据驱动的树集合核。对RF的内核观点已被用来为其统计特性的理论研究开发一个原则性的框架。最近,已经表明,内核解释是与其他基于树的合奏有关的,例如GBTS。但是,尚未广泛探索和系统地评估内核与树团之间的联系的实际实用性。 我们工作的重点是研究内核方法与包括RF和GBT在内的基于树的合奏之间的相互作用。我们在包括连续和二进制目标的全面模拟研究中阐明了基于RF和GBT的核的性能和特性。我们表明,对于连续目标,RF/GBT内核在较高维度的情况下与它们各自的合奏都具有竞争力,尤其是在具有较大嘈杂功能的情况下。对于二进制目标,RF/GBT内核及其各自的合奏表现出可比的性能。我们为回归和分类提供了现实生活数据集的结果,以显示如何在实践中利用这些见解。总体而言,我们的结果支持基于树合奏的内核,这是对从业者工具箱的宝贵补充。 最后,我们讨论了基于树集合的核的扩展,以生存目标,可解释的原型以及地标分类和回归。我们概述了由贝叶斯同行提供的频繁树合奏所提供的内核的未来研究线。

Tree based ensembles such as Breiman's random forest (RF) and Gradient Boosted Trees (GBT) can be interpreted as implicit kernel generators, where the ensuing proximity matrix represents the data-driven tree ensemble kernel. Kernel perspective on the RF has been used to develop a principled framework for theoretical investigation of its statistical properties. Recently, it has been shown that the kernel interpretation is germane to other tree-based ensembles e.g. GBTs. However, practical utility of the links between kernels and the tree ensembles has not been widely explored and systematically evaluated. Focus of our work is investigation of the interplay between kernel methods and the tree based ensembles including the RF and GBT. We elucidate the performance and properties of the RF and GBT based kernels in a comprehensive simulation study comprising of continuous and binary targets. We show that for continuous targets, the RF/GBT kernels are competitive to their respective ensembles in higher dimensional scenarios, particularly in cases with larger number of noisy features. For the binary target, the RF/GBT kernels and their respective ensembles exhibit comparable performance. We provide the results from real life data sets for regression and classification to show how these insights may be leveraged in practice. Overall, our results support the tree ensemble based kernels as a valuable addition to the practitioner's toolbox. Finally, we discuss extensions of the tree ensemble based kernels for survival targets, interpretable prototype and landmarking classification and regression. We outline future line of research for kernels furnished by Bayesian counterparts of the frequentist tree ensembles.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源