基于Huber损失的超级学习者，申请医疗保健支出

论文标题

基于Huber损失的超级学习者，申请医疗保健支出

A Huber loss-based super learner with applications to healthcare expenditures

论文作者

Wu, Ziyue, Benkeser, David

论文摘要

医疗支出的复杂分布通过单个模型对统计建模构成挑战。 Super Learne是一种结合各种候选模型的合奏方法，是成本估算的有前途的替代方法，并且显示出比单个模型的好处。但是，在存在极端价值的情况下，例如医疗保健支出数据，标准的超级学习方法的性能可能很差。我们提出了一个基于Huber损失的超级学习者，这是一种“强大的”损失函数，将平方误差损失与绝对损失结合起来，以减轻异常值的影响。我们得出了甲骨文的不平等现象，这些不等式在该方法的有限样本和渐近性能上建立了界限。我们表明，所提出的方法既可以直接用于优化Huber风险，又可以在有限样本设置中进行优化的平方误差是最终目标。对于后一种情况，我们提供了两种方法，用于执行网格搜索，以索引索引Huber损失的值。模拟和实际数据分析表明，使用我们提出的方法，在成本预测和因果效应估计中有明显的有限样本收益。

Complex distributions of the healthcare expenditure pose challenges to statistical modeling via a single model. Super learning, an ensemble method that combines a range of candidate models, is a promising alternative for cost estimation and has shown benefits over a single model. However, standard approaches to super learning may have poor performance in settings where extreme values are present, such as healthcare expenditure data. We propose a super learner based on the Huber loss, a "robust" loss function that combines squared error loss with absolute loss to down-weight the influence of outliers. We derive oracle inequalities that establish bounds on the finite-sample and asymptotic performance of the method. We show that the proposed method can be used both directly to optimize Huber risk, as well as in finite-sample settings where optimizing mean squared error is the ultimate goal. For this latter scenario, we provide two methods for performing a grid search for values of the robustification parameter indexing the Huber loss. Simulations and real data analysis demonstrate appreciable finite-sample gains in cost prediction and causal effect estimation using our proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题