评估stylegan2-ada在医学图像上的性能

论文标题

评估stylegan2-ada在医学图像上的性能

Evaluating the Performance of StyleGAN2-ADA on Medical Images

论文作者

Woodland, McKell, Wood, John, Anderson, Brian M., Kundu, Suprateek, Lin, Ethan, Koay, Eugene, Odisio, Bruno, Chung, Caroline, Kang, Hyunseon Christine, Venkatesan, Aradhana M., Yedururi, Sireesha, De, Brian, Lin, Yuan-Mao, Patel, Ankit B., Brock, Kristy K.

论文摘要

尽管生成的对抗网络（GAN）在医学成像中表现出了希望，但它们具有四个阻碍其实用性的主要局限性：计算成本，数据需求，可靠的评估指标和培训复杂性。我们的工作在stylegan2-ADA在高分辨率医学成像数据集中的新型应用中调查了这些障碍。我们的数据集由非对比度和对比增强计算机断层扫描（CT）扫描的含肝轴向切片组成。此外，我们利用了四个由各种成像方式组成的公共数据集。我们培训了一个具有转移学习（从Flickr-Faces-HQ数据集）和数据增强（水平翻转和自适应歧视器增强器）的stylegan2网络。通过Fréchet成立距离（FID）进行定量测量网络的生成质量，并通过对七位放射科医生和辐射肿瘤学家进行视觉图灵测试。 stylegan2-ADA网络在我们的肝脏CT数据集上实现了5.22（$ \ pm $ 0.17）的FID。它还为公开可用的SLIVER07，CHESTX-RAY14，ACDC和医疗细分Decathlon（脑肿瘤）数据集设置了10.78、3.52、21.17和5.39的新唱片FID。在视觉图灵测试中，临床医生在42％的时间内将产生的图像评为实际猜测。我们的计算消融研究表明，传输学习和数据增强稳定训练并提高了生成的图像的感知质量。我们观察到FID与人类对医学图像的感知评估保持一致。最后，我们的工作发现，StyleGAN2-ADA始终在没有超参数搜索或再培训的情况下产生高质量的结果。

Although generative adversarial networks (GANs) have shown promise in medical imaging, they have four main limitations that impeded their utility: computational cost, data requirements, reliable evaluation measures, and training complexity. Our work investigates each of these obstacles in a novel application of StyleGAN2-ADA to high-resolution medical imaging datasets. Our dataset is comprised of liver-containing axial slices from non-contrast and contrast-enhanced computed tomography (CT) scans. Additionally, we utilized four public datasets composed of various imaging modalities. We trained a StyleGAN2 network with transfer learning (from the Flickr-Faces-HQ dataset) and data augmentation (horizontal flipping and adaptive discriminator augmentation). The network's generative quality was measured quantitatively with the Fréchet Inception Distance (FID) and qualitatively with a visual Turing test given to seven radiologists and radiation oncologists. The StyleGAN2-ADA network achieved a FID of 5.22 ($\pm$ 0.17) on our liver CT dataset. It also set new record FIDs of 10.78, 3.52, 21.17, and 5.39 on the publicly available SLIVER07, ChestX-ray14, ACDC, and Medical Segmentation Decathlon (brain tumors) datasets. In the visual Turing test, the clinicians rated generated images as real 42% of the time, approaching random guessing. Our computational ablation study revealed that transfer learning and data augmentation stabilize training and improve the perceptual quality of the generated images. We observed the FID to be consistent with human perceptual evaluation of medical images. Finally, our work found that StyleGAN2-ADA consistently produces high-quality results without hyperparameter searches or retraining.

下载PDF全文

下载文献需遵守相关版权规定

论文标题