cirdataset：用于临床可解剖的肺结核和恶性预测的大规模数据集

论文标题

cirdataset：用于临床可解剖的肺结核和恶性预测的大规模数据集

CIRDataset: A large-scale Dataset for Clinically-Interpretable lung nodule Radiomics and malignancy prediction

论文作者

Choi, Wookjin, Dahiya, Navdeep, Nadeem, Saad

论文摘要

在肺癌表面上的尖峰/肺泡是肺癌恶性肿瘤的良好预测指标，因此是放射科医生的良好预测指标，作为标准化的肺-RADS临床评分标准的一部分。鉴于放射科医生的结节和2D切片评估的3D几何形状，手动调节/肺泡注释是一项繁琐的任务，因此，迄今为止，尚无公共数据集用于探测这些临床报道的SOTA恶性预测算法中这些临床报告的重要性。作为本文的一部分，我们在两个公共数据集中的分段肺结节上发布了956个放射科医生QA/QC'Spiculation/lobulation注释的大规模临床解释数据集，该数据集包含956个放射学家QA/QC'spiculation/lobulation注释。我们还提出了一个基于多级voxel2mesh扩展到细分结节的端到端深度学习模型（同时保存尖峰），对尖峰进行分类（尖锐/尖峰和弯曲/小叶）并执行恶性预测。先前的方法对LIDC和LUNGX数据集进行了恶性预测，但没有强大的归因于任何临床报道/可作用的特征（由于已知的超参数敏感性问题，具有一般归因方案）。随着这种全面注销的Cirdataset和端到端深度学习基线的发布，我们希望恶性预测方法可以验证其解释，对我们的基线进行基准测试，并提供临床上可行的见解。数据集，代码，预处理的模型和Docker容器可在https://github.com/nadeemlab/cir上找到。

Spiculations/lobulations, sharp/curved spikes on the surface of lung nodules, are good predictors of lung cancer malignancy and hence, are routinely assessed and reported by radiologists as part of the standardized Lung-RADS clinical scoring criteria. Given the 3D geometry of the nodule and 2D slice-by-slice assessment by radiologists, manual spiculation/lobulation annotation is a tedious task and thus no public datasets exist to date for probing the importance of these clinically-reported features in the SOTA malignancy prediction algorithms. As part of this paper, we release a large-scale Clinically-Interpretable Radiomics Dataset, CIRDataset, containing 956 radiologist QA/QC'ed spiculation/lobulation annotations on segmented lung nodules from two public datasets, LIDC-IDRI (N=883) and LUNGx (N=73). We also present an end-to-end deep learning model based on multi-class Voxel2Mesh extension to segment nodules (while preserving spikes), classify spikes (sharp/spiculation and curved/lobulation), and perform malignancy prediction. Previous methods have performed malignancy prediction for LIDC and LUNGx datasets but without robust attribution to any clinically reported/actionable features (due to known hyperparameter sensitivity issues with general attribution schemes). With the release of this comprehensively-annotated CIRDataset and end-to-end deep learning baseline, we hope that malignancy prediction methods can validate their explanations, benchmark against our baseline, and provide clinically-actionable insights. Dataset, code, pretrained models, and docker containers are available at https://github.com/nadeemlab/CIR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题