Chexpedition：调查胸部X射线算法转换为临床环境的概括挑战

论文标题

Chexpedition：调查胸部X射线算法转换为临床环境的概括挑战

CheXpedition: Investigating Generalization Challenges for Translation of Chest X-Ray Algorithms to the Clinical Setting

论文作者

Rajpurkar, Pranav, Joshi, Anirudh, Pareek, Anuj, Chen, Phil, Kiani, Amirhossein, Irvin, Jeremy, Ng, Andrew Y., Lungren, Matthew P.

论文摘要

尽管在将深度学习算法应用于胸部X射线解释方面，最近有了一些进步，但我们确定了将胸部X射线算法转换为临床环境的三个主要挑战。我们在CHEXPERT挑战排行榜上检查了三个任务上的前10个性能模型的性能：（1）TB检测，（2）胸部X射线照片上的病理检测以及（3）外部机构数据的病理检测。首先，我们发现CHEXPERT竞争中的前10个胸部X射线模型在未经微调或包括TB标签中在培训数据中检测到TB的任务平均为0.851。其次，我们发现模型在X射线照片（AUC = 0.916）上的平均性能与它们在原始胸部X射线图像上的性能相似（AUC = 0.924）。第三，我们发现在外部数据集上测试的模型的性能要么相当或超过放射科医生的平均性能。我们认为，我们的调查将为深度学习算法快速翻译为安全有效的临床决策支持工具，可以通过大型影响研究和临床试验进行前瞻性验证。

Although there have been several recent advances in the application of deep learning algorithms to chest x-ray interpretation, we identify three major challenges for the translation of chest x-ray algorithms to the clinical setting. We examine the performance of the top 10 performing models on the CheXpert challenge leaderboard on three tasks: (1) TB detection, (2) pathology detection on photos of chest x-rays, and (3) pathology detection on data from an external institution. First, we find that the top 10 chest x-ray models on the CheXpert competition achieve an average AUC of 0.851 on the task of detecting TB on two public TB datasets without fine-tuning or including the TB labels in training data. Second, we find that the average performance of the models on photos of x-rays (AUC = 0.916) is similar to their performance on the original chest x-ray images (AUC = 0.924). Third, we find that the models tested on an external dataset either perform comparably to or exceed the average performance of radiologists. We believe that our investigation will inform rapid translation of deep learning algorithms to safe and effective clinical decision support tools that can be validated prospectively with large impact studies and clinical trials.

下载PDF全文

下载文献需遵守相关版权规定

论文标题