在高频超声波中以有限的标记数据检测标记的数据有限的自我监督学习

论文标题

在高频超声波中以有限的标记数据检测标记的数据有限的自我监督学习

Self-Supervised Learning with Limited Labeled Data for Prostate Cancer Detection in High Frequency Ultrasound

论文作者

Wilson, Paul F. R., Gilany, Mahdi, Jamzad, Amoon, Fooladgar, Fahimeh, To, Minh Nguyen Nhat, Wodlinger, Brian, Abolmaesumi, Purang, Mousavi, Parvin

论文摘要

高频高分辨率的微分解数据的深度学习分析显示了前列腺癌检测的巨大希望。超声数据分析的先前方法在很大程度上遵循了监督的学习范式。用于训练深网的超声图像的地面真实标签通常包括通过活检获得的组织样本的组织病理学分析产生的粗糙注释。这对标记数据的可用性和质量产生了固有的局限性，对监督学习方法的成功构成了重大挑战。另一方面，未标记的前列腺超声数据更丰富。在这项工作中，我们成功地将自我监督的表示学习应用于微型数据。使用来自两个临床中心获得的391名受试者的1028个活检核心的超声数据，我们证明了使用这种方法学到的特征表示可以用于从非癌组织分类，从而在独立的测试集中获得了91％的AUROC评分。据我们所知，这是使用超声数据进行前列腺癌检测的第一种成功的端到端自学学习方法。我们的方法的表现优于基线监督的学习方法，在不同的数据中心之间很好地概括了在添加更不标记的数据时的性能良好的范围，这是使用大量未标记数据的未来研究的有希望的方法。

Deep learning-based analysis of high-frequency, high-resolution micro-ultrasound data shows great promise for prostate cancer detection. Previous approaches to analysis of ultrasound data largely follow a supervised learning paradigm. Ground truth labels for ultrasound images used for training deep networks often include coarse annotations generated from the histopathological analysis of tissue samples obtained via biopsy. This creates inherent limitations on the availability and quality of labeled data, posing major challenges to the success of supervised learning methods. On the other hand, unlabeled prostate ultrasound data are more abundant. In this work, we successfully apply self-supervised representation learning to micro-ultrasound data. Using ultrasound data from 1028 biopsy cores of 391 subjects obtained in two clinical centres, we demonstrate that feature representations learnt with this method can be used to classify cancer from non-cancer tissue, obtaining an AUROC score of 91% on an independent test set. To the best of our knowledge, this is the first successful end-to-end self-supervised learning approach for prostate cancer detection using ultrasound data. Our method outperforms baseline supervised learning approaches, generalizes well between different data centers, and scale well in performance as more unlabeled data are added, making it a promising approach for future research using large volumes of unlabeled data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题