论文标题
通过典型相关性传播证明基于胸部X射线图像诊断的数据集不平衡数据集的风险
Demonstrating The Risk of Imbalanced Datasets in Chest X-ray Image-based Diagnostics by Prototypical Relevance Propagation
论文作者
论文摘要
整合多源胸部X射线数据集以改善自动诊断的趋势引起了人们的担忧,即模型学会通过识别图像的源域而不是医学病理来利用特定于源的相关性,以提高性能。我们假设这种效应是通过跨源域的标记 - 损害(即与来源相对应的疾病的患病率)实现的。因此,在这项工作中,我们对在多源训练中对标签不平衡的影响进行彻底研究,以实现广泛使用的sistx-ray14和chexpert数据集对肺炎检测任务。结果突出并强调了使用更忠实和透明的自我解释模型进行自动诊断的重要性,从而实现了对伪学习的固有检测。他们进一步说明,在确保标签平衡的源域数据集时,可以大大降低学习虚假相关性的不良影响。
The recent trend of integrating multi-source Chest X-Ray datasets to improve automated diagnostics raises concerns that models learn to exploit source-specific correlations to improve performance by recognizing the source domain of an image rather than the medical pathology. We hypothesize that this effect is enforced by and leverages label-imbalance across the source domains, i.e, prevalence of a disease corresponding to a source. Therefore, in this work, we perform a thorough study of the effect of label-imbalance in multi-source training for the task of pneumonia detection on the widely used ChestX-ray14 and CheXpert datasets. The results highlight and stress the importance of using more faithful and transparent self-explaining models for automated diagnosis, thus enabling the inherent detection of spurious learning. They further illustrate that this undesirable effect of learning spurious correlations can be reduced considerably when ensuring label-balanced source domain datasets.