论文标题
cirdataset:用于临床可解剖的肺结核和恶性预测的大规模数据集
CIRDataset: A large-scale Dataset for Clinically-Interpretable lung nodule Radiomics and malignancy prediction
论文作者
论文摘要
在肺癌表面上的尖峰/肺泡是肺癌恶性肿瘤的良好预测指标,因此是放射科医生的良好预测指标,作为标准化的肺-RADS临床评分标准的一部分。鉴于放射科医生的结节和2D切片评估的3D几何形状,手动调节/肺泡注释是一项繁琐的任务,因此,迄今为止,尚无公共数据集用于探测这些临床报道的SOTA恶性预测算法中这些临床报告的重要性。作为本文的一部分,我们在两个公共数据集中的分段肺结节上发布了956个放射科医生QA/QC'Spiculation/lobulation注释的大规模临床解释数据集,该数据集包含956个放射学家QA/QC'spiculation/lobulation注释。我们还提出了一个基于多级voxel2mesh扩展到细分结节的端到端深度学习模型(同时保存尖峰),对尖峰进行分类(尖锐/尖峰和弯曲/小叶)并执行恶性预测。先前的方法对LIDC和LUNGX数据集进行了恶性预测,但没有强大的归因于任何临床报道/可作用的特征(由于已知的超参数敏感性问题,具有一般归因方案)。随着这种全面注销的Cirdataset和端到端深度学习基线的发布,我们希望恶性预测方法可以验证其解释,对我们的基线进行基准测试,并提供临床上可行的见解。数据集,代码,预处理的模型和Docker容器可在https://github.com/nadeemlab/cir上找到。
Spiculations/lobulations, sharp/curved spikes on the surface of lung nodules, are good predictors of lung cancer malignancy and hence, are routinely assessed and reported by radiologists as part of the standardized Lung-RADS clinical scoring criteria. Given the 3D geometry of the nodule and 2D slice-by-slice assessment by radiologists, manual spiculation/lobulation annotation is a tedious task and thus no public datasets exist to date for probing the importance of these clinically-reported features in the SOTA malignancy prediction algorithms. As part of this paper, we release a large-scale Clinically-Interpretable Radiomics Dataset, CIRDataset, containing 956 radiologist QA/QC'ed spiculation/lobulation annotations on segmented lung nodules from two public datasets, LIDC-IDRI (N=883) and LUNGx (N=73). We also present an end-to-end deep learning model based on multi-class Voxel2Mesh extension to segment nodules (while preserving spikes), classify spikes (sharp/spiculation and curved/lobulation), and perform malignancy prediction. Previous methods have performed malignancy prediction for LIDC and LUNGx datasets but without robust attribution to any clinically reported/actionable features (due to known hyperparameter sensitivity issues with general attribution schemes). With the release of this comprehensively-annotated CIRDataset and end-to-end deep learning baseline, we hope that malignancy prediction methods can validate their explanations, benchmark against our baseline, and provide clinically-actionable insights. Dataset, code, pretrained models, and docker containers are available at https://github.com/nadeemlab/CIR.