“我很遗憾听到这一点”：通过整体描述符数据集找到语言模型中的新偏见

论文标题

“我很遗憾听到这一点”：通过整体描述符数据集找到语言模型中的新偏见

"I'm sorry to hear that": Finding New Biases in Language Models with a Holistic Descriptor Dataset

论文作者

Smith, Eric Michael, Hall, Melissa, Kambadur, Melanie, Presani, Eleonora, Williams, Adina

论文摘要

随着语言模型的流行，清楚地衡量所有可能的人口认同标记以避免延续现有的社会危害变得越来越重要。目前存在许多用于测量偏差的数据集，但它们在人口统计学轴的覆盖范围中受到限制，并且通常与预设偏置测试一起使用，这些测试以哪些类型的偏见模型可以显示出来。在这项工作中，我们提出了一个新的，更具包容性的偏差测量数据集，整体数据集，其中包括13个不同人口轴的近600个描述术语。整体比亚人是在参与过程中组装的，其中包括具有这些条款经验的专家和社区成员。这些描述符与一组偏差测量模板相结合，以产生超过450,000个独特的句子提示，我们用来探索，识别和减少多种生成模型中的新型偏见。我们证明，整体比亚人可以有效地衡量语言模型以及进攻性分类器中的令牌可能性中以前无法检测到的偏见。我们将邀请数据集的添加和修正案，我们希望这将作为评估NLP模型中偏见的更易于使用和标准化方法的基础。

As language models grow in popularity, it becomes increasingly important to clearly measure all possible markers of demographic identity in order to avoid perpetuating existing societal harms. Many datasets for measuring bias currently exist, but they are restricted in their coverage of demographic axes and are commonly used with preset bias tests that presuppose which types of biases models can exhibit. In this work, we present a new, more inclusive bias measurement dataset, HolisticBias, which includes nearly 600 descriptor terms across 13 different demographic axes. HolisticBias was assembled in a participatory process including experts and community members with lived experience of these terms. These descriptors combine with a set of bias measurement templates to produce over 450,000 unique sentence prompts, which we use to explore, identify, and reduce novel forms of bias in several generative models. We demonstrate that HolisticBias is effective at measuring previously undetectable biases in token likelihoods from language models, as well as in an offensiveness classifier. We will invite additions and amendments to the dataset, which we hope will serve as a basis for more easy-to-use and standardized methods for evaluating bias in NLP models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题