论文标题

分布随机森林:异质性调整和多元分布回归

Distributional Random Forests: Heterogeneity Adjustment and Multivariate Distributional Regression

论文作者

Ćevid, Domagoj, Michel, Loris, Näf, Jeffrey, Meinshausen, Nicolai, Bühlmann, Peter

论文摘要

随机森林(Breiman,2001)是一种成功且广泛使用的回归和分类算法。其多功能性的一部分吸引力和理由是(隐式)在训练数据上构建内核型加权功能,该培训数据也可用于除原始平均值估计以外的其他目标。我们提出了一种基于其联合条件分布的多元响应的新型森林构建,而与估计目标和数据模型无关。它使用基于MMD分布度量的新拆分标准,该标准适用于检测多元分布中的异质性。诱导的权重定义了对完整条件分布的估计,而该分布又可用于任意和潜在的复杂目标。正如我们在广泛的示例中所说明的那样,该方法非常通用和方便。该代码可作为Python和R软件包DRF提供。

Random Forest (Breiman, 2001) is a successful and widely used regression and classification algorithm. Part of its appeal and reason for its versatility is its (implicit) construction of a kernel-type weighting function on training data, which can also be used for targets other than the original mean estimation. We propose a novel forest construction for multivariate responses based on their joint conditional distribution, independent of the estimation target and the data model. It uses a new splitting criterion based on the MMD distributional metric, which is suitable for detecting heterogeneity in multivariate distributions. The induced weights define an estimate of the full conditional distribution, which in turn can be used for arbitrary and potentially complicated targets of interest. The method is very versatile and convenient to use, as we illustrate on a wide range of examples. The code is available as Python and R packages drf.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源