论文标题
尾巴摇摆狗:社会偏见基准的数据集建筑偏见
The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks
论文作者
论文摘要
我们如何可靠地相信从社会偏见基准获得的分数是给定语言模型中有问题的社会偏见的忠实指标?在这项工作中,我们通过将社会偏见与数据集构造过程中的选择形成对比的非社会偏见来研究这个问题,这些偏见甚至可能是人类眼中无法识别的。为此,我们基于无害的修改(例如释义或随机采样)来模拟给定基准的各种替代结构,以维持其社会偏见的本质。在两个众所周知的社会偏见基准(Winogender和Biasnli)上,我们观察到这些浅修改对各种模型的偏见产生了令人惊讶的影响。我们希望这些令人不安的观察能够激发社会偏见的更强大的衡量标准。
How reliably can we trust the scores obtained from social bias benchmarks as faithful indicators of problematic social biases in a given language model? In this work, we study this question by contrasting social biases with non-social biases stemming from choices made during dataset construction that might not even be discernible to the human eye. To do so, we empirically simulate various alternative constructions for a given benchmark based on innocuous modifications (such as paraphrasing or random-sampling) that maintain the essence of their social bias. On two well-known social bias benchmarks (Winogender and BiasNLI) we observe that these shallow modifications have a surprising effect on the resulting degree of bias across various models. We hope these troubling observations motivate more robust measures of social biases.