高维度的树木和森林的估计和推断

论文标题

高维度的树木和森林的估计和推断

Estimation and Inference with Trees and Forests in High Dimensions

论文作者

Syrgkanis, Vasilis, Zampetakis, Manolis

论文摘要

我们在稀疏性约束下分析了具有二进制特征的高维状态下回归树和森林的有限样本平均误差（MSE）性能。我们证明，如果仅$ r $的$ d $特征与平均结果功能相关，则通过CART经验MSE Criterion实现的MSE速率贪婪地建造了浅树，而MSE速率仅取决于对数$ d $的对数。我们证明了上限，其确切依赖数量相关变量$ r $取决于功能之间的相关性和相关程度。对于非常相关的特征，我们还表明，完全成长的诚实森林达到了快速的MSE速率，它们的预测也渐近地正常，因此可以渐近有效的推断，可以适应回归函数的稀疏性。

We analyze the finite sample mean squared error (MSE) performance of regression trees and forests in the high dimensional regime with binary features, under a sparsity constraint. We prove that if only $r$ of the $d$ features are relevant for the mean outcome function, then shallow trees built greedily via the CART empirical MSE criterion achieve MSE rates that depend only logarithmically on the ambient dimension $d$. We prove upper bounds, whose exact dependence on the number relevant variables $r$ depends on the correlation among the features and on the degree of relevance. For strongly relevant features, we also show that fully grown honest forests achieve fast MSE rates and their predictions are also asymptotically normal, enabling asymptotically valid inference that adapts to the sparsity of the regression function.

下载PDF全文

下载文献需遵守相关版权规定

论文标题