选择镜头：性别偏见评估中的缺陷

论文标题

选择镜头：性别偏见评估中的缺陷

Choose Your Lenses: Flaws in Gender Bias Evaluation

论文作者

Orgad, Hadas, Belinkov, Yonatan

论文摘要

近年来，衡量和减轻性别偏见的巨大努力导致引入了这种静脉中使用的大量任务，数据集和指标。在该立场论文中，我们评估了当前性别偏见评估的范式，并确定其中的几个缺陷。首先，我们强调了外部偏见指标的重要性，这些指标衡量了模型在某些任务上的绩效如何受性别影响，而不是对模型表示的内在评估，而模型表示的内在评估与与与系统相互作用的特定危害不太紧密地连接。我们发现，在大多数研究中只能测量少数几个外部指标，尽管可以测量更多的外部指标。其次，我们发现数据集和指标通常是耦合的，并讨论它们的耦合如何阻碍获得可靠结论的能力，以及如何将它们解散。然后，我们研究了数据集及其组成的选择以及度量的选择如何影响偏差测量，从而发现了每个数据集的显着变化。最后，我们提出了几种准则，以实现更可靠的性别偏见评估。

Considerable efforts to measure and mitigate gender bias in recent years have led to the introduction of an abundance of tasks, datasets, and metrics used in this vein. In this position paper, we assess the current paradigm of gender bias evaluation and identify several flaws in it. First, we highlight the importance of extrinsic bias metrics that measure how a model's performance on some task is affected by gender, as opposed to intrinsic evaluations of model representations, which are less strongly connected to specific harms to people interacting with systems. We find that only a few extrinsic metrics are measured in most studies, although more can be measured. Second, we find that datasets and metrics are often coupled, and discuss how their coupling hinders the ability to obtain reliable conclusions, and how one may decouple them. We then investigate how the choice of the dataset and its composition, as well as the choice of the metric, affect bias measurement, finding significant variations across each of them. Finally, we propose several guidelines for more reliable gender bias evaluation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题