论文标题
在模型培训中优化机会公平
Optimising Equal Opportunity Fairness in Model Training
论文作者
论文摘要
现实世界中的数据集通常编码刻板印象和社会偏见。这些偏见可以被训练有素的模型隐含地捕获,从而导致预测有偏见,并加剧了现有的社会先入之见。现有的辩论方法,例如对抗性训练和从表示形式中删除受保护的信息,已被证明可以减少偏见。但是,公平标准和培训目标之间的断开连接使理论上很难就不同技术的有效性进行推理。在这项工作中,我们提出了两个新颖的培训目标,这些目标直接针对{\ IT机会平等}的广泛使用标准进行了优化,并表明它们有效地减少了偏见,同时在两个分类任务上保持高性能。
Real-world datasets often encode stereotypes and societal biases. Such biases can be implicitly captured by trained models, leading to biased predictions and exacerbating existing societal preconceptions. Existing debiasing methods, such as adversarial training and removing protected information from representations, have been shown to reduce bias. However, a disconnect between fairness criteria and training objectives makes it difficult to reason theoretically about the effectiveness of different techniques. In this work, we propose two novel training objectives which directly optimise for the widely-used criterion of {\it equal opportunity}, and show that they are effective in reducing bias while maintaining high performance over two classification tasks.