调查伯特的性别偏见

论文标题

调查伯特的性别偏见

Investigating Gender Bias in BERT

论文作者

Bhardwaj, Rishabh, Majumder, Navonil, Poria, Soujanya

论文摘要

上下文语言模型（CLM）已将NLP基准推向了新的高度。在下游任务（例如文本分类）中，使用CLM提供的单词嵌入已成为一种新规范。但是，除非解决，否则CLM容易学习数据集中的内在性别偏见。结果，下游NLP模型的预测可以通过改变性别单词来明显变化，例如将“ He”更换为“她”，甚至是性别中性的单词。在本文中，我们将分析集中在流行的CLM上，即Bert。我们分析了它在与情感和情感强度预测有关的五个下游任务中引起的性别偏见。对于每个任务，我们都使用Bert的单词嵌入来训练一个简单的回归器。然后，我们使用权益评估语料库评估回归者中的性别偏见。理想情况下，从特定的设计中，模型应从输入中丢弃性别信息的特征。但是，结果表明，系统对性别 - 尤其单词和短语的预测具有显着依赖性。我们声称，可以通过从单词嵌入中删除性别特异性特征来减少这种偏见。因此，对于BERT中的每一层，我们都会确定主要编码性别信息的方向。该方向形成的空间称为单词嵌入的语义空间中的性别子空间。我们提出了一种算法，该算法可以找到细粒性的性别方向，即每个BERT层的一个主要方向。这消除了在多个维度上实现性别子空间的需求，并防止其他关键信息被省略。实验表明，将嵌入组件置于此类方向方面取得了巨大成功，可以减少BERT诱导的下游任务中的偏差。

Contextual language models (CLMs) have pushed the NLP benchmarks to a new height. It has become a new norm to utilize CLM provided word embeddings in downstream tasks such as text classification. However, unless addressed, CLMs are prone to learn intrinsic gender-bias in the dataset. As a result, predictions of downstream NLP models can vary noticeably by varying gender words, such as replacing "he" to "she", or even gender-neutral words. In this paper, we focus our analysis on a popular CLM, i.e., BERT. We analyse the gender-bias it induces in five downstream tasks related to emotion and sentiment intensity prediction. For each task, we train a simple regressor utilizing BERT's word embeddings. We then evaluate the gender-bias in regressors using an equity evaluation corpus. Ideally and from the specific design, the models should discard gender informative features from the input. However, the results show a significant dependence of the system's predictions on gender-particular words and phrases. We claim that such biases can be reduced by removing genderspecific features from word embedding. Hence, for each layer in BERT, we identify directions that primarily encode gender information. The space formed by such directions is referred to as the gender subspace in the semantic space of word embeddings. We propose an algorithm that finds fine-grained gender directions, i.e., one primary direction for each BERT layer. This obviates the need of realizing gender subspace in multiple dimensions and prevents other crucial information from being omitted. Experiments show that removing embedding components in such directions achieves great success in reducing BERT-induced bias in the downstream tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题