论文标题
在存在复杂上下文因素和同伴组的情况下基准测试的统计机器学习方法
A statistical machine learning approach for benchmarking in the presence of complex contextual factors and peer groups
论文作者
论文摘要
公平地比较个人或组织之间的能力对于发展强大而有意义的定量基准的发展至关重要。为了进行公平的比较,必须考虑上下文因素,并且只能在类似的组织(例如同伴组)之间进行比较。以前的基准测定方法已使用线性回归来调整上下文因素,但是当比较度量和协变量之间存在非线性关系时,已知线性回归是亚最佳的。在本文中,我们提出了一个随机森林模型,用于基准测试,可以调整这些潜在的非线性关系,并在高噪声数据的案例研究中验证该方法。我们提供了拟合模型和比较措施的新的可视化和数值摘要,以促进分析师和非技术观众的解释。可以在整个队列或同伴组中进行比较,并且自举提供了一种估计调整后的度量和排名中不确定性的方法。我们得出的结论是,随机森林模型可以促进组织之间的公平比较定量措施,包括在复杂上下文因素关系的案例中,并且模型和输出很容易由利益相关者解释。
The ability to compare between individuals or organisations fairly is important for the development of robust and meaningful quantitative benchmarks. To make fair comparisons, contextual factors must be taken into account, and comparisons should only be made between similar organisations such as peer groups. Previous benchmarking methods have used linear regression to adjust for contextual factors, however linear regression is known to be sub-optimal when nonlinear relationships exist between the comparative measure and covariates. In this paper we propose a random forest model for benchmarking that can adjust for these potential nonlinear relationships, and validate the approach in a case-study of high noise data. We provide new visualisations and numerical summaries of the fitted models and comparative measures to facilitate interpretation by both analysts and non-technical audiences. Comparisons can be made across the cohort or within peer groups, and bootstrapping provides a means of estimating uncertainty in both adjusted measures and rankings. We conclude that random forest models can facilitate fair comparisons between organisations for quantitative measures including in cases on complex contextual factor relationships, and that the models and outputs are readily interpreted by stakeholders.