论文标题
Chexclusion:深胸部X射线分类器中的公平差距
CheXclusion: Fairness gaps in deep chest X-ray classifiers
论文作者
论文摘要
机器学习系统最近因其在临床任务(尤其是医学成像中)实现专家级表现的能力而受到了很多关注。在这里,我们研究了对受保护属性的最先进的深度学习分类器从X射线图像产生诊断标签的最先进的深度学习分类器。我们训练卷积神经网络,以预测3个著名的公共X射线数据集中的14个诊断标签:Mimic-CXR,Chest-Xray8,Chexpert,以及所有这些数据集的多站点聚合。我们在不同受保护的属性(例如患者性别,年龄,种族和保险类型)中评估TPR差异 - 真正的积极率(TPR)的差异(TPR),作为社会经济状况的代理。我们证明,对于所有临床任务和所有子组,在所有数据集中的最新分类器中都存在TPR差异。多源数据集对应于最小的差异,这表明一种减少偏差的方法。我们发现TPR差异与亚组比例疾病负担没有显着相关。随着临床模型从文件转变为产品,我们鼓励临床决策者在部署前仔细审核算法差异。我们的代码可以在https://github.com/lalehseyed/chexclusion上找到
Machine learning systems have received much attention recently for their ability to achieve expert-level performance on clinical tasks, particularly in medical imaging. Here, we examine the extent to which state-of-the-art deep learning classifiers trained to yield diagnostic labels from X-ray images are biased with respect to protected attributes. We train convolution neural networks to predict 14 diagnostic labels in 3 prominent public chest X-ray datasets: MIMIC-CXR, Chest-Xray8, CheXpert, as well as a multi-site aggregation of all those datasets. We evaluate the TPR disparity -- the difference in true positive rates (TPR) -- among different protected attributes such as patient sex, age, race, and insurance type as a proxy for socioeconomic status. We demonstrate that TPR disparities exist in the state-of-the-art classifiers in all datasets, for all clinical tasks, and all subgroups. A multi-source dataset corresponds to the smallest disparities, suggesting one way to reduce bias. We find that TPR disparities are not significantly correlated with a subgroup's proportional disease burden. As clinical models move from papers to products, we encourage clinical decision makers to carefully audit for algorithmic disparities prior to deployment. Our code can be found at, https://github.com/LalehSeyyed/CheXclusion