论文标题
虚拟与现实:使用XCAT幻像进行胸部计算机断层扫描的COVID-19分类器的外部验证
Virtual vs. Reality: External Validation of COVID-19 Classifiers using XCAT Phantoms for Chest Computed Tomography
论文作者
论文摘要
医学成像中人工智能模型的研究受到泛化的损害。在过去的一年中,这个问题尤其令人担忧,其中有许多深入学习进行Covid-19诊断的应用。虚拟成像试验(VIT)可以为这些模型的客观评估提供解决方案。在利用VIT的这项工作中,我们创建了CVIT-COVID数据集,其中包括来自模拟Covid-19的180个实际成像的计算机断层扫描(CT)图像,以及在不同的COVID-19形态和成像属性下的正常幻影模型。我们评估了滑铁卢大学的开源,深度学习模型的性能,该模型训练有多机构数据,并用开放式临床数据集训练有多机构的模型,称为MOSMED。我们进一步验证了该模型的性能与305个CT图像的开放临床数据,以了解虚拟和实际临床数据性能。开源模型在原始滑铁卢数据集上发表了几乎完美的性能,但在另一个临床数据集(AUC = 0.77)和我们的模拟CVIT-COVID数据集(AUC = 0.55)上表现出一致的外部测试性能下降。内部模型在内部测试集(MOSMED测试集)测试时达到了0.87的AUC。但是,在临床和我们的模拟CVIT-COVID数据集评估时,性能降至0.65和0.69。 VIT框架提供了对成像条件的控制,使我们表明性能没有变化,因为CT暴露从28.5更改为57 MAS。 VIT框架还提供了体素水平的地面真相,表明弥漫性COVID-19感染大小在AUC = 0.87时的性能要高得多,而肺部的肺部量> 2.65%,而AUC = 0.52,对于含量<2.65%的局灶性疾病。虚拟成像框架实现了这些独特的模型性能分析。
Research studies of artificial intelligence models in medical imaging have been hampered by poor generalization. This problem has been especially concerning over the last year with numerous applications of deep learning for COVID-19 diagnosis. Virtual imaging trials (VITs) could provide a solution for objective evaluation of these models. In this work utilizing the VITs, we created the CVIT-COVID dataset including 180 virtually imaged computed tomography (CT) images from simulated COVID-19 and normal phantom models under different COVID-19 morphology and imaging properties. We evaluated the performance of an open-source, deep-learning model from the University of Waterloo trained with multi-institutional data and an in-house model trained with the open clinical dataset called MosMed. We further validated the model's performance against open clinical data of 305 CT images to understand virtual vs. real clinical data performance. The open-source model was published with nearly perfect performance on the original Waterloo dataset but showed a consistent performance drop in external testing on another clinical dataset (AUC=0.77) and our simulated CVIT-COVID dataset (AUC=0.55). The in-house model achieved an AUC of 0.87 while testing on the internal test set (MosMed test set). However, performance dropped to an AUC of 0.65 and 0.69 when evaluated on clinical and our simulated CVIT-COVID dataset. The VIT framework offered control over imaging conditions, allowing us to show there was no change in performance as CT exposure was changed from 28.5 to 57 mAs. The VIT framework also provided voxel-level ground truth, revealing that performance of in-house model was much higher at AUC=0.87 for diffuse COVID-19 infection size >2.65% lung volume versus AUC=0.52 for focal disease with <2.65% volume. The virtual imaging framework enabled these uniquely rigorous analyses of model performance.