论文标题
解释性的道路充满了偏见:衡量解释的公平性
The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations
论文作者
论文摘要
安全 - 关键设置(例如医疗保健)中的机器学习模型通常是黑框:它们包含大量参数,这些参数对用户不透明。事后解释性方法通常提出了模仿这些黑框模型的行为,以帮助用户信任模型预测。在这项工作中,我们使用金融,医疗保健,大学招生和美国司法系统的四个环境中的实际数据对不同受保护的亚组的这种解释质量进行了审核。在两个不同的黑框模型架构和四种流行的解释性方法中,我们发现解释模型的近似质量(也称为保真度)在亚组之间有很大不同。我们还证明,将解释性方法与鲁棒机器学习的最新进展配对可以提高某些情况下的解释公平性。但是,我们强调了将非零保真度差距传达给用户的详细信息的重要性,因为在所有设置中可能都不存在单个解决方案。最后,我们讨论了不公平的解释模型作为机器学习社区面临的一个具有挑战性和研究的问题的含义。
Machine learning models in safety-critical settings like healthcare are often blackboxes: they contain a large number of parameters which are not transparent to users. Post-hoc explainability methods where a simple, human-interpretable model imitates the behavior of these blackbox models are often proposed to help users trust model predictions. In this work, we audit the quality of such explanations for different protected subgroups using real data from four settings in finance, healthcare, college admissions, and the US justice system. Across two different blackbox model architectures and four popular explainability methods, we find that the approximation quality of explanation models, also known as the fidelity, differs significantly between subgroups. We also demonstrate that pairing explainability methods with recent advances in robust machine learning can improve explanation fairness in some settings. However, we highlight the importance of communicating details of non-zero fidelity gaps to users, since a single solution might not exist across all settings. Finally, we discuss the implications of unfair explanation models as a challenging and understudied problem facing the machine learning community.