预测性能的选择后置信度范围

论文标题

预测性能的选择后置信度范围

Post-Selection Confidence Bounds for Prediction Performance

论文作者

Rink, Pascal, Brannath, Werner

论文摘要

在机器学习中，从潜在的大量竞争模型中选择有前途的模型以及对其泛化性能的评估是需要仔细考虑的关键任务。通常，模型的选择和评估是严格分开的努力，将样本分为培训，验证和评估集，并且仅计算单个置信区间，以进行最终选定模型的预测性能。但是，我们提出了一种算法，如何根据将选择问题解释为同时推理问题，以根据其在评估集中选择的预测性能来计算多个模型的有效较低置信界。我们使用bootstrap倾斜和MAXT型多重性校正。该方法普遍适用于预测模型的任何组合，任何模型选择策略以及接受权重的任何预测性能度量。我们进行了各种仿真实验，这些实验表明我们提出的方法产生的置信范围至少与标准方法相对良好，并且可靠地达到了标称覆盖范围的概率。此外，尤其是在样本量很小的情况下，我们提出的方法比仅选择一个评估模型的默认选择可以产生更好的性能预测模型。

In machine learning, the selection of a promising model from a potentially large number of competing models and the assessment of its generalization performance are critical tasks that need careful consideration. Typically, model selection and evaluation are strictly separated endeavors, splitting the sample at hand into a training, validation, and evaluation set, and only compute a single confidence interval for the prediction performance of the final selected model. We however propose an algorithm how to compute valid lower confidence bounds for multiple models that have been selected based on their prediction performances in the evaluation set by interpreting the selection problem as a simultaneous inference problem. We use bootstrap tilting and a maxT-type multiplicity correction. The approach is universally applicable for any combination of prediction models, any model selection strategy, and any prediction performance measure that accepts weights. We conducted various simulation experiments which show that our proposed approach yields lower confidence bounds that are at least comparably good as bounds from standard approaches, and that reliably reach the nominal coverage probability. In addition, especially when sample size is small, our proposed approach yields better performing prediction models than the default selection of only one model for evaluation does.

下载PDF全文

下载文献需遵守相关版权规定

论文标题