论文标题
医学成像中的特征鲁棒性和性别差异:基于MRI的阿尔茨海默氏病检测的案例研究
Feature robustness and sex differences in medical imaging: a case study in MRI-based Alzheimer's disease detection
论文作者
论文摘要
卷积神经网络已使基于医学图像的诊断有了重大改进。但是,越来越明显的是,这些模型在面对虚假的相关性和数据集转移时易受性能降解,例如领导者,例如代表性不足的患者群体的表现不足。在本文中,我们比较了ADNI MRI数据集上的两个分类方案:使用手动选择的体积特征的简单逻辑回归模型,以及对3D MRI数据训练的卷积神经网络。我们在面对不同的数据集分裂,训练集性组成和疾病阶段的情况下评估了受过训练的模型的鲁棒性。与其他成像方式中的早期工作相反,我们没有观察到培训数据集中多数组的模型性能的明确模式。取而代之的是,尽管逻辑回归对数据集组成完全可靠,但我们发现,在培训数据集中包括更多女性受试者时,男性和女性受试者的CNN性能通常会提高。我们假设这可能是由于两性病理学的固有差异。此外,在我们的分析中,逻辑回归模型的表现优于3D CNN,强调了基于先验知识的手动特征规范的实用性,以及需要更强大的自动功能选择。
Convolutional neural networks have enabled significant improvements in medical image-based diagnosis. It is, however, increasingly clear that these models are susceptible to performance degradation when facing spurious correlations and dataset shift, leading, e.g., to underperformance on underrepresented patient groups. In this paper, we compare two classification schemes on the ADNI MRI dataset: a simple logistic regression model using manually selected volumetric features, and a convolutional neural network trained on 3D MRI data. We assess the robustness of the trained models in the face of varying dataset splits, training set sex composition, and stage of disease. In contrast to earlier work in other imaging modalities, we do not observe a clear pattern of improved model performance for the majority group in the training dataset. Instead, while logistic regression is fully robust to dataset composition, we find that CNN performance is generally improved for both male and female subjects when including more female subjects in the training dataset. We hypothesize that this might be due to inherent differences in the pathology of the two sexes. Moreover, in our analysis, the logistic regression model outperforms the 3D CNN, emphasizing the utility of manual feature specification based on prior knowledge, and the need for more robust automatic feature selection.