论文标题
使用混合效应随机森林的柔性域预测
Flexible domain prediction using mixed effects random forests
论文作者
论文摘要
本文促进了随机森林作为多功能工具,用于在存在小面积特异性样本量的情况下估算空间分类的指标。小面积估计器主要在回归设定中概念化,并依靠线性混合模型来说明调查数据的层次结构。相比之下,机器学习方法提供了非线性和非参数替代方案,结合了出色的预测性能和降低的模型 - 单位指定风险。混合效应随机森林结合了回归森林的优势和对层次依赖性建模的能力。本文提供了一个基于混合效应随机森林的连贯框架,用于估计小面积平均值,并提出了一个非参数自举估计器,用于评估估计值的不确定性。我们说明了我们提出的方法使用墨西哥收入数据的拟议方法的优势。最后,在基于模型和基于设计的模拟中评估了该方法,将提出的方法与基于回归的传统方法进行了比较,以估计小面积平均值。
This paper promotes the use of random forests as versatile tools for estimating spatially disaggregated indicators in the presence of small area-specific sample sizes. Small area estimators are predominantly conceptualized within the regression-setting and rely on linear mixed models to account for the hierarchical structure of the survey data. In contrast, machine learning methods offer non-linear and non-parametric alternatives, combining excellent predictive performance and a reduced risk of model-misspecification. Mixed effects random forests combine advantages of regression forests with the ability to model hierarchical dependencies. This paper provides a coherent framework based on mixed effects random forests for estimating small area averages and proposes a non-parametric bootstrap estimator for assessing the uncertainty of the estimates. We illustrate advantages of our proposed methodology using Mexican income-data from the state Nuevo León. Finally, the methodology is evaluated in model-based and design-based simulations comparing the proposed methodology to traditional regression-based approaches for estimating small area averages.