论文标题

使用体素级分割指标对评估多灶性前列腺癌定位的影响

The impact of using voxel-level segmentation metrics on evaluating multifocal prostate cancer localisation

论文作者

Yan, Wen, Yang, Qianye, Syer, Tom, Min, Zhe, Punwani, Shonit, Emberton, Mark, Barratt, Dean C., Chiu, Bernard, Hu, Yipeng

论文摘要

骰子相似性系数(DSC)和Hausdorff距离(HD)广泛用于评估医学图像分割。当他们独自一人报道时,他们也因其不清楚甚至误导性的临床解释而受到批评。由于边界平滑度或受试者中的多个感兴趣区域(ROI),DSC也可能与HDS有很大差异。更重要的是,两个度量标准还可以基于1型和2个错误的结果具有非线性的非单调关系,这是针对使用结果分割的特定临床决策而设计的。尽管在这些指标之间引起分歧的案例并不难假设。这项工作首先提出了一种新的不对称检测指标,适应用于计划前列腺癌程序的对象检测的指标。然后将病变级指标与体素级DSC和HD进行比较,而3D UNET用于从多参数MR(MPMR)图像分割病变。根据实验结果,我们报告了DSC和HD之间的成对协议和相关性1)在病变级别的体素级DSC和召回控制的精度之间,与Cohen的[0.49,0.61]和Pearson的[0.66,0.66,0.76](p-Valuese} <0.001)。但是,在使用DSC的实际错误和感知的对应物之间的假阳性和错误阴性的差异分别可以高达152和154,而在357个测试套件中,分别可以高达152和154。因此,我们仔细地得出结论,尽管存在显着的相关性,但诸如DSC之类的体素级指标可能会歪曲病变级检测准确性,以评估多焦点前列腺癌的定位,并应谨慎解释。

Dice similarity coefficient (DSC) and Hausdorff distance (HD) are widely used for evaluating medical image segmentation. They have also been criticised, when reported alone, for their unclear or even misleading clinical interpretation. DSCs may also differ substantially from HDs, due to boundary smoothness or multiple regions of interest (ROIs) within a subject. More importantly, either metric can also have a nonlinear, non-monotonic relationship with outcomes based on Type 1 and 2 errors, designed for specific clinical decisions that use the resulting segmentation. Whilst cases causing disagreement between these metrics are not difficult to postulate. This work first proposes a new asymmetric detection metric, adapting those used in object detection, for planning prostate cancer procedures. The lesion-level metrics is then compared with the voxel-level DSC and HD, whereas a 3D UNet is used for segmenting lesions from multiparametric MR (mpMR) images. Based on experimental results we report pairwise agreement and correlation 1) between DSC and HD, and 2) between voxel-level DSC and recall-controlled precision at lesion-level, with Cohen's [0.49, 0.61] and Pearson's [0.66, 0.76] (p-values}<0.001) at varying cut-offs. However, the differences in false-positives and false-negatives, between the actual errors and the perceived counterparts if DSC is used, can be as high as 152 and 154, respectively, out of the 357 test set lesions. We therefore carefully conclude that, despite of the significant correlations, voxel-level metrics such as DSC can misrepresent lesion-level detection accuracy for evaluating localisation of multifocal prostate cancer and should be interpreted with caution.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源