对成人心脏磁共振成像数据进行培训的基于U-NET的分割对手术计划的稀有先天性心脏病的推广程度如何？

论文标题

对成人心脏磁共振成像数据进行培训的基于U-NET的分割对手术计划的稀有先天性心脏病的推广程度如何？

How well do U-Net-based segmentation trained on adult cardiac magnetic resonance imaging data generalise to rare congenital heart diseases for surgical planning?

论文作者

Koehler, Sven, Tandon, Animesh, Hussain, Tarique, Latus, Heiner, Pickardt, Thomas, Sarikouch, Samir, Beerbaum, Philipp, Greil, Gerald, Engelhardt, Sandy, Wolf, Ivo

论文摘要

根据当前的指南，计划对法洛（TOF）的先天性心脏病四部曲（TOF）患者的肺瓣膜置换手术的最佳干预时间主要基于心室体积和功能。这两个生物标志物都是通过分割3D心脏磁共振（CMR）图像来可靠评估的。在过去几年中，U-NET架构在提供的数据上显示出令人印象深刻的结果。但是，在临床实践中，考虑到来自不同扫描仪特性的个体病理和图像特性，数据集更加多样化。此外，稀缺的诸如TOF之类的复杂稀有疾病的特定训练数据很少。在这项工作中，1）我们评估了使用公开可用的标记数据集（自动心脏诊断挑战（ACDC）数据集）进行培训的准确性差距，然后将其应用于TOF患者的CMR数据以及VICE的CMR数据，以及2），以及2）是否可以在将模型应用于更异型的数据基础上时获得相似的结果。多个深度学习模型接受了四倍的交叉验证培训。之后，对它们进行了对另一个集合的不同CMR图像进行评估。我们的结果证实，当前的深度学习模型可以在单个数据收集中取得出色的结果（左心室骰子$ 0.951 \ pm {0.003} $/$ 0.941 \ pm {0.007} $ train/varrationation）。但是，一旦将它们应用于其他病理学，很明显它们过度地适合训练病理学（剩下的骰子得分在左侧的$ 0.072 \ pm {0.001} $之间下降到右心室的0.165 \ $ 0.165 \ pm {0.001} $）。

Planning the optimal time of intervention for pulmonary valve replacement surgery in patients with the congenital heart disease Tetralogy of Fallot (TOF) is mainly based on ventricular volume and function according to current guidelines. Both of these two biomarkers are most reliably assessed by segmentation of 3D cardiac magnetic resonance (CMR) images. In several grand challenges in the last years, U-Net architectures have shown impressive results on the provided data. However, in clinical practice, data sets are more diverse considering individual pathologies and image properties derived from different scanner properties. Additionally, specific training data for complex rare diseases like TOF is scarce. For this work, 1) we assessed the accuracy gap when using a publicly available labelled data set (the Automatic Cardiac Diagnosis Challenge (ACDC) data set) for training and subsequent applying it to CMR data of TOF patients and vice versa and 2) whether we can achieve similar results when applying the model to a more heterogeneous data base. Multiple deep learning models were trained with four-fold cross validation. Afterwards they were evaluated on the respective unseen CMR images from the other collection. Our results confirm that current deep learning models can achieve excellent results (left ventricle dice of $0.951\pm{0.003}$/$0.941\pm{0.007}$ train/validation) within a single data collection. But once they are applied to other pathologies, it becomes apparent how much they overfit to the training pathologies (dice score drops between $0.072\pm{0.001}$ for the left and $0.165\pm{0.001}$ for the right ventricle).

下载PDF全文

下载文献需遵守相关版权规定

论文标题