使用胸部X射线射线照片的Covid-19分类任务的可推广人工智能模型：在四个临床数据集上进行评估，患者有15,097例

论文标题

使用胸部X射线射线照片的Covid-19分类任务的可推广人工智能模型：在四个临床数据集上进行评估，患者有15,097例

A Generalizable Artificial Intelligence Model for COVID-19 Classification Task Using Chest X-ray Radiographs: Evaluated Over Four Clinical Datasets with 15,097 Patients

论文作者

Zhang, Ran, Tie, Xin, Garrett, John W., Griner, Dalton, Qi, Zhihua, Bevins, Nicholas B., Reeder, Scott B., Chen, Guang-Hong

论文摘要

目的：回答一个长期存在的问题，即是否可以将从单个临床部位训练的模型推广到外部部位。材料和方法：从3,264例CoVID-19阳性患者和4,802 COVID-19-19-COVID-19-COVID患者中收集了17,537个胸部X射线射线照相仪（CXR），从单个部位收集了AI模型开发。回顾性地评估了训练模型的普遍性，使用四个不同的现实世界临床数据集，总计15097例患者（3,277名Covid-19-19-199阳性患者）总共26,633 CXR。接收器操作特征曲线（AUC）下的面积用于评估诊断性能。结果：使用单源临床数据集训练的AI模型在应用于内部时间测试集时，AUC的AUC为0.82（95％CI：0.80，0.84）。当将两个外部临床部位的数据集应用于数据集时，AUC为0.81（95％CI：0.80，0.82）和0.82（95％CI：0.80，0.84）。当应用于医学成像和数据资源中心（MIDRC）收集的多机构COVID-19数据集时，可实现0.79（95％CI：0.77，0.81）的AUC（95％CI：0.77，0.81）。幂律依赖性n^（k）（k被经验发现为-0.21至-0.25），表明对训练数据大小的性能依赖相对较弱。结论：使用来自单个临床部位的良好曲线数据训练的COVID-19分类AI模型可以推广到外部临床部位，而性能显着下降。

Purpose: To answer the long-standing question of whether a model trained from a single clinical site can be generalized to external sites. Materials and Methods: 17,537 chest x-ray radiographs (CXRs) from 3,264 COVID-19-positive patients and 4,802 COVID-19-negative patients were collected from a single site for AI model development. The generalizability of the trained model was retrospectively evaluated using four different real-world clinical datasets with a total of 26,633 CXRs from 15,097 patients (3,277 COVID-19-positive patients). The area under the receiver operating characteristic curve (AUC) was used to assess diagnostic performance. Results: The AI model trained using a single-source clinical dataset achieved an AUC of 0.82 (95% CI: 0.80, 0.84) when applied to the internal temporal test set. When applied to datasets from two external clinical sites, an AUC of 0.81 (95% CI: 0.80, 0.82) and 0.82 (95% CI: 0.80, 0.84) were achieved. An AUC of 0.79 (95% CI: 0.77, 0.81) was achieved when applied to a multi-institutional COVID-19 dataset collected by the Medical Imaging and Data Resource Center (MIDRC). A power-law dependence, N^(k )(k is empirically found to be -0.21 to -0.25), indicates a relatively weak performance dependence on the training data sizes. Conclusion: COVID-19 classification AI model trained using well-curated data from a single clinical site is generalizable to external clinical sites without a significant drop in performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题