论文标题
致力于医疗应用机器学习系统的质量管理
Towards Quality Management of Machine Learning Systems for Medical Applications
论文作者
论文摘要
机器学习系统在临床常规中的使用仍然受到医疗设备认证的必要性和/或难以在诊所的质量管理系统中实施这些系统的必要性。在这种情况下,用户的关键问题是如何确保可靠的模型预测以及如何定期评估模型结果的质量。在本文中,我们首先回顾了为什么常见的样本外部性能指标不足以评估模型预测的鲁棒性。我们讨论了机器学习系统临床实施的一些概念基础,并认为供应商和用户都应承担某些责任,这对高风险医疗设备的常见做法也是如此。沿着这一行,重新审视了处理机器学习模型的鲁棒性(或其不存在)的最佳实践。我们从AAPM任务组100报告编号提出了方法。 283是为临床过程开发质量管理计划的自然框架,该程序包括机器学习系统。尽管是通用示例,但用明确的示例进行了说明。我们的分析表明,此框架中的风险评估如何独立于其稳健性评估来容纳机器学习系统。特别是,我们强调了如何在风险评估和质量管理系统的开发中系统地考虑机器学习系统的可解释性程度。
The use of machine learning systems in clinical routine is still hampered by the necessity of a medical device certification and/or by difficulty to implement these systems in a clinic's quality management system. In this context, the key questions for a user are how to ensure reliable model predictions and how to appraise the quality of a model's results on a regular basis. In this paper we first review why the common out-of-sample performance metrics are not sufficient for assessing the robustness of model predictions. We discuss some conceptual foundation for a clinical implementation of a machine learning system and argue that both vendors and users should take certain responsibilities, as is already common practice for high-risk medical equipment. Along this line the best practices for dealing with robustness (or absence thereof) of machine learning models are revisited. We propose the methodology from AAPM Task Group 100 report no. 283 as a natural framework for developing a quality management program for a clinical process that encompasses a machine learning system. This is illustrated with an explicit albeit generic example. Our analysis shows how the risk evaluation in this framework can accommodate machine learning systems independently of their robustness evaluation. In particular, we highlight how the degree of interpretability of a machine learning system can be accounted for systematically within the risk evaluation and in the development of a quality management system.