论文标题

倾斜决策树的一致性及其增强和随机森林

Consistency of Oblique Decision Tree and its Boosting and Random Forest

论文作者

Zhan, Haoran, Liu, Yu, Xia, Yingcun

论文摘要

分类和回归树(CART),随机森林(RF)和梯度增强树(GBT)可能是最流行的统计学习方法集。但是,他们的统计一致性只能在对基础回归函数的非常限制的假设下证明。作为标准推车的扩展,使用预测变量的线性组合作为分区变量,斜决策树(ODT)受到了很多关注。 ODT倾向于在数值上的表现比购物车要好,并且需要更少的分区。在本文中,我们表明,只要它们是$ l^2 $可集成,ODT对于非常通用的回归功能是一致的。然后,我们证明了基于ODT的随机森林(ODRF)的一致性,无论是否完全生长。最后,我们通过借用正交匹配追求的技术并研究其在树结构上的条件下的一致性来提出一个GBT的合奏,以进行回归。根据既定理论完善了现有的计算机包装,对实际数据集进行了广泛的实验表明,我们的整体增强树和ODRF都对RF和其他森林都有明显的总体改进。

Classification and Regression Tree (CART), Random Forest (RF) and Gradient Boosting Tree (GBT) are probably the most popular set of statistical learning methods. However, their statistical consistency can only be proved under very restrictive assumptions on the underlying regression function. As an extension to standard CART, the oblique decision tree (ODT), which uses linear combinations of predictors as partitioning variables, has received much attention. ODT tends to perform numerically better than CART and requires fewer partitions. In this paper, we show that ODT is consistent for very general regression functions as long as they are $L^2$ integrable. Then, we prove the consistency of the ODT-based random forest (ODRF), whether fully grown or not. Finally, we propose an ensemble of GBT for regression by borrowing the technique of orthogonal matching pursuit and study its consistency under very mild conditions on the tree structure. After refining existing computer packages according to the established theory, extensive experiments on real data sets show that both our ensemble boosting trees and ODRF have noticeable overall improvements over RF and other forests.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源