医疗保健中算法公平的净益处，校准，阈值选择和培训目标

论文标题

医疗保健中算法公平的净益处，校准，阈值选择和培训目标

Net benefit, calibration, threshold selection, and training objectives for algorithmic fairness in healthcare

论文作者

Pfohl, Stephen R., Xu, Yizhe, Foryciarz, Agata, Ignatiadis, Nikolaos, Genkins, Julian, Shah, Nigam H.

论文摘要

越来越多的作品利用算法公平的范式来构建技术的发展，以预测和主动减轻对健康不平等的引入或加剧，而健康不平等可能会导致使用模型引导的决策。我们评估了模型绩效，公平性的度量与决策的预期效用之间的相互作用，以提供实用建议，以实现算法公平原则的运作，以开发和评估医疗保健中预测模型的情况。我们通过开发模型来进行经验案例研究，以估计动脉粥样硬化心血管疾病的十年风险，以根据临床实践指南为他汀类药物的启动提供信息。我们证明，与使用标准学习范式相比，将公平考虑因素纳入模型训练目标的方法通常不会提高模型性能或为任何研究的患者人群提供更大的净收益，然后再使用标准学习范式，然后选择阈值选择与患者偏好，干预效果的证据和模型校准。当测量结果不受患者人群之间的差异测量误差的约束时，这些结果就会成立，并且无论模型性能指标的差异是否存在，例如真和假阳性错误率是否存在。最后，我们主张将模型开发工作重点放在开发校准模型上，这些模型可以很好地预测所有患者人群的结果，同时强调这种努力是透明的报告，参与性设计以及对模型知识干预措施在上下文中影响的影响的补充。

A growing body of work uses the paradigm of algorithmic fairness to frame the development of techniques to anticipate and proactively mitigate the introduction or exacerbation of health inequities that may follow from the use of model-guided decision-making. We evaluate the interplay between measures of model performance, fairness, and the expected utility of decision-making to offer practical recommendations for the operationalization of algorithmic fairness principles for the development and evaluation of predictive models in healthcare. We conduct an empirical case-study via development of models to estimate the ten-year risk of atherosclerotic cardiovascular disease to inform statin initiation in accordance with clinical practice guidelines. We demonstrate that approaches that incorporate fairness considerations into the model training objective typically do not improve model performance or confer greater net benefit for any of the studied patient populations compared to the use of standard learning paradigms followed by threshold selection concordant with patient preferences, evidence of intervention effectiveness, and model calibration. These results hold when the measured outcomes are not subject to differential measurement error across patient populations and threshold selection is unconstrained, regardless of whether differences in model performance metrics, such as in true and false positive error rates, are present. In closing, we argue for focusing model development efforts on developing calibrated models that predict outcomes well for all patient populations while emphasizing that such efforts are complementary to transparent reporting, participatory design, and reasoning about the impact of model-informed interventions in context.

下载PDF全文

下载文献需遵守相关版权规定

论文标题