论文标题
多维性别偏见分类
Multi-Dimensional Gender Bias Classification
论文作者
论文摘要
对机器学习模型进行了培训以查找数据中的模式。 NLP模型在培训性别有偏见的文本时会无意间学习社会上不良的模式。在这项工作中,我们提出了一个一般框架,该框架在几个务实和语义方面分解了文本中的性别偏见:来自所谈论的人的性别的偏见,与说话者所说的性别的偏见以及演讲者的性别偏见。使用这个细粒度的框架,我们自动注释了八个带有性别信息的大型数据集。此外,我们收集了一个小说,众包评估基准的话语级别重写。区分性别偏见沿多个维度很重要,因为它使我们能够训练更细粒度的性别偏见分类器。我们向我们的分类器表明,对各种重要应用有价值,例如在生成模型中控制性别偏见,在任意文本中检测性别偏见,并从性别上阐明了进攻性语言。
Machine learning models are trained to find patterns in data. NLP models can inadvertently learn socially undesirable patterns when training on gender biased text. In this work, we propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions: bias from the gender of the person being spoken about, bias from the gender of the person being spoken to, and bias from the gender of the speaker. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information. In addition, we collect a novel, crowdsourced evaluation benchmark of utterance-level gender rewrites. Distinguishing between gender bias along multiple dimensions is important, as it enables us to train finer-grained gender bias classifiers. We show our classifiers prove valuable for a variety of important applications, such as controlling for gender bias in generative models, detecting gender bias in arbitrary text, and shed light on offensive language in terms of genderedness.