使用统计或机器学习方法开发的临床预测模型的稳定性

论文标题

使用统计或机器学习方法开发的临床预测模型的稳定性

Stability of clinical prediction models developed using statistical or machine learning methods

论文作者

Riley, Richard D, Collins, Gary S

论文摘要

临床预测模型估计一个人患有特定健康结果的风险，其条件是其多个预测因素的值。开发的模型是开发数据集和所选模型构建策略的结果，包括样本量，预测因素和分析方法（例如回归或机器学习）。在这里，我们引起了人们的关注，即许多模型都是使用导致模型及其预测不稳定的小数据集开发的（估计风险）。我们在从总体平均水平转变为个人级别的估计风险中定义了四个级别的模型稳定性。然后，通过对统计和机器学习方法的模拟和案例研究，我们表明模型的估计风险中的不稳定通常是相当大的，并且最终表现为对新数据中预测的错误估计。因此，我们建议研究人员在模型开发阶段始终检查不稳定，并提出不稳定的情节和措施。这需要重复多个（例如1000）引导样本（例如，产生多个引导模型，然后得出（i）Bootstrap模型预测（Y-AXIS）的预测不稳定性（Y-AXIS）与原始模型预测（X-axis）的启动，（II）cribrap cribibratib clibrape caribrape（II），重复了多个（例如1000）引导样本（例如，1000）的自动带模型（例如，（i）为校准量表（X-XAXIS，（ii）cribratibratibratibratibratibratibe，（ii）cribratibratibratibratibration clibratibratib cribibrible（II）样本; （iii）不稳定指数，这是个人原始模型和自举模型预测之间的平均绝对差异。案例研究用于说明这些不稳定性评估如何有助于放心（或不）模型预测是否可能是可靠的（或不是），同时还可以告知模型的批判性评估（偏见评估的风险），公平评估和进一步的验证要求。

Clinical prediction models estimate an individual's risk of a particular health outcome, conditional on their values of multiple predictors. A developed model is a consequence of the development dataset and the chosen model building strategy, including the sample size, number of predictors and analysis method (e.g., regression or machine learning). Here, we raise the concern that many models are developed using small datasets that lead to instability in the model and its predictions (estimated risks). We define four levels of model stability in estimated risks moving from the overall mean to the individual level. Then, through simulation and case studies of statistical and machine learning approaches, we show instability in a model's estimated risks is often considerable, and ultimately manifests itself as miscalibration of predictions in new data. Therefore, we recommend researchers should always examine instability at the model development stage and propose instability plots and measures to do so. This entails repeating the model building steps (those used in the development of the original prediction model) in each of multiple (e.g., 1000) bootstrap samples, to produce multiple bootstrap models, and then deriving (i) a prediction instability plot of bootstrap model predictions (y-axis) versus original model predictions (x-axis), (ii) a calibration instability plot showing calibration curves for the bootstrap models in the original sample; and (iii) the instability index, which is the mean absolute difference between individuals' original and bootstrap model predictions. A case study is used to illustrate how these instability assessments help reassure (or not) whether model predictions are likely to be reliable (or not), whilst also informing a model's critical appraisal (risk of bias rating), fairness assessment and further validation requirements.

下载PDF全文

下载文献需遵守相关版权规定

论文标题