论文标题

胸部X射线自动分类:一种旨在临床实施的半学方法,通过四个深度学习体系结构组合利用不同类型的标签

Chest x-ray automated triage: a semiologic approach designed for clinical implementation, exploiting different types of labels through a combination of four Deep Learning architectures

论文作者

Mosquera, Candelaria, Diaz, Facundo Nahuel, Binder, Fernando, Rabellino, Jose Martin, Benitez, Sonia Elizabeth, Beresñak, Alejandro Daniel, Seehaus, Alberto, Ducrey, Gabriel, Ocantos, Jorge Alberto, Luna, Daniel Roberto

论文摘要

背景和目标:过去几年发布的多个胸部X射线数据集具有用于不同计算机视觉任务的地面真实标签,这表明使用可以利用各种注释类型的方法来改善自动化的胸xray解释的性能。这项工作基于不同的卷积体系结构的晚融合提供了一种深度学习方法,该方法允许使用简单实现的异质数据进行培训,并评估其在独立测试数据上的性能。我们专注于获得可以成功集成到医院工作流程中的临床有用工具。材料和方法:基于专家意见,我们选择了四个目标胸部X射线发现,即肺部不透明,裂缝,气胸和胸膜积液。对于每个发现,我们都定义了最适当的地面真相标签,并构建了四个培训数据集,结合了公共胸部X射线数据集和我们的机构档案的图像。我们培训了四个不同的深度学习体系结构,并将其输出与晚期融合策略相结合,并获得了统一的工具。在两个测试数据集上测量了性能:一个外部公开可用的数据集和一个回顾性机构数据集,以估计当地人口的绩效。结果:外部和本地测试集分别具有4376和1064张图像,该模型在接收器操作特性曲线下显示了0.75(95%CI:0.74-0.76)和0.87(95%CI:0.86-0.89)的区域,可检测异常胸部X射线。对于当地人口,获得了86%(95%CI:84-90)的灵敏度,并且获得了88%(95%CI:86-90)的特异性,人群亚组之间没有显着差异。我们介绍了热图的例子,以显示出完成的解释性水平,并检查了True and Fress的阳性。

BACKGROUND AND OBJECTIVES: The multiple chest x-ray datasets released in the last years have ground-truth labels intended for different computer vision tasks, suggesting that performance in automated chest-xray interpretation might improve by using a method that can exploit diverse types of annotations. This work presents a Deep Learning method based on the late fusion of different convolutional architectures, that allows training with heterogeneous data with a simple implementation, and evaluates its performance on independent test data. We focused on obtaining a clinically useful tool that could be successfully integrated into a hospital workflow. MATERIALS AND METHODS: Based on expert opinion, we selected four target chest x-ray findings, namely lung opacities, fractures, pneumothorax and pleural effusion. For each finding we defined the most adequate type of ground-truth label, and built four training datasets combining images from public chest x-ray datasets and our institutional archive. We trained four different Deep Learning architectures and combined their outputs with a late fusion strategy, obtaining a unified tool. Performance was measured on two test datasets: an external openly-available dataset, and a retrospective institutional dataset, to estimate performance on local population. RESULTS: The external and local test sets had 4376 and 1064 images, respectively, for which the model showed an area under the Receiver Operating Characteristics curve of 0.75 (95%CI: 0.74-0.76) and 0.87 (95%CI: 0.86-0.89) in the detection of abnormal chest x-rays. For the local population, a sensitivity of 86% (95%CI: 84-90), and a specificity of 88% (95%CI: 86-90) were obtained, with no significant differences between demographic subgroups. We present examples of heatmaps to show the accomplished level of interpretability, examining true and false positives.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源