论文标题

基于Huber损失的超级学习者,申请医疗保健支出

A Huber loss-based super learner with applications to healthcare expenditures

论文作者

Wu, Ziyue, Benkeser, David

论文摘要

医疗支出的复杂分布通过单个模型对统计建模构成挑战。 Super Learne是一种结合各种候选模型的合奏方法,是成本估算的有前途的替代方法,并且显示出比单个模型的好处。但是,在存在极端价值的情况下,例如医疗保健支出数据,标准的超级学习方法的性能可能很差。我们提出了一个基于Huber损失的超级学习者,这是一种“强大的”损失函数,将平方误差损失与绝对损失结合起来,以减轻异常值的影响。我们得出了甲骨文的不平等现象,这些不等式在该方法的有限样本和渐近性能上建立了界限。我们表明,所提出的方法既可以直接用于优化Huber风险,又可以在有限样本设置中进行优化的平方误差是最终目标。对于后一种情况,我们提供了两种方法,用于执行网格搜索,以索引索引Huber损失的值。模拟和实际数据分析表明,使用我们提出的方法,在成本预测和因果效应估计中有明显的有限样本收益。

Complex distributions of the healthcare expenditure pose challenges to statistical modeling via a single model. Super learning, an ensemble method that combines a range of candidate models, is a promising alternative for cost estimation and has shown benefits over a single model. However, standard approaches to super learning may have poor performance in settings where extreme values are present, such as healthcare expenditure data. We propose a super learner based on the Huber loss, a "robust" loss function that combines squared error loss with absolute loss to down-weight the influence of outliers. We derive oracle inequalities that establish bounds on the finite-sample and asymptotic performance of the method. We show that the proposed method can be used both directly to optimize Huber risk, as well as in finite-sample settings where optimizing mean squared error is the ultimate goal. For this latter scenario, we provide two methods for performing a grid search for values of the robustification parameter indexing the Huber loss. Simulations and real data analysis demonstrate appreciable finite-sample gains in cost prediction and causal effect estimation using our proposed method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源