公制：通过在线抽样进行精确CT前列腺分割的协同图像和体素级学习

论文标题

公制：通过在线抽样进行精确CT前列腺分割的协同图像和体素级学习

MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT Prostate Segmentation via Online Sampling

论文作者

He, Kelei, Lian, Chunfeng, Adeli, Ehsan, Huo, Jing, Gao, Yang, Zhang, Bing, Zhang, Junfeng, Shen, Dinggang

论文摘要

在最近的研究中，包括UNET和VNET在内的完全卷积网络（FCN）是用于语义细分的广泛使用的网络架构。但是，常规的FCN通常通过跨凝结或骰子损失进行训练，这仅计算单独的像素的预测和地面图标签之间的误差。这通常会导致预测的分割中的非平滑社区。为了解决这个问题，我们提出了一个两阶段的框架，第一阶段是快速定位前列腺区域，第二阶段是通过多任务UNET体系结构精确分段前列腺的第二阶段。我们通过多任务网络中的Voxel采样介绍了一个新颖的在线度量学习模块。因此，拟议的网络具有双分支结构，可解决两个任务：1）旨在产生前列腺细分的细分子网络，以及2）2）素素学习子网络，旨在提高由度量损失监督的学习特征空间的质量。具体而言，通过中间特征地图，体素学习子网络样品（包括三重态和对）。与传统的深度度量学习方法不同，在训练阶段之前在图像级中生成三胞胎或对，我们提出的素素元素以在线方式进行采样，并通过多任务学习以端到端的方式进行操作。为了评估所提出的方法，我们在由339名患者组成的真实CT图像数据集上实施了广泛的实验。消融研究表明，与传统的横向学习方法相比，我们的方法可以有效地学习更多代表性的体素水平特征。比较表明，所提出的方法以合理的边距优于最先进的方法。

Fully convolutional networks (FCNs), including UNet and VNet, are widely-used network architectures for semantic segmentation in recent studies. However, conventional FCN is typically trained by the cross-entropy or Dice loss, which only calculates the error between predictions and ground-truth labels for pixels individually. This often results in non-smooth neighborhoods in the predicted segmentation. To address this problem, we propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate by a multi-task UNet architecture. We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network. Therefore, the proposed network has a dual-branch architecture that tackles two tasks: 1) a segmentation sub-network aiming to generate the prostate segmentation, and 2) a voxel-metric learning sub-network aiming to improve the quality of the learned feature space supervised by a metric loss. Specifically, the voxel-metric learning sub-network samples tuples (including triplets and pairs) in voxel-level through the intermediate feature maps. Unlike conventional deep metric learning methods that generate triplets or pairs in image-level before the training phase, our proposed voxel-wise tuples are sampled in an online manner and operated in an end-to-end fashion via multi-task learning. To evaluate the proposed method, we implement extensive experiments on a real CT image dataset consisting of 339 patients. The ablation studies show that our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss. And the comparisons show that the proposed method outperforms the state-of-the-art methods by a reasonable margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题