文本样式转移用于使用蒙版语言建模的缓解偏差的转移

论文标题

文本样式转移用于使用蒙版语言建模的缓解偏差的转移

Text Style Transfer for Bias Mitigation using Masked Language Modeling

论文作者

Tokpo, Ewoenam Kwaku, Calders, Toon

论文摘要

众所周知，Internet和其他数字平台上的文本数据包含重要水平的偏见和刻板印象。尽管许多这样的文本都包含刻板印象和偏见，这些刻板印象和偏见是自然语言而存在的，原因不一定是恶意的，但有一些至关重要的原因可以减轻这些偏见。首先，这些文本被用作培训语料库，以训练语言模型，以供诸如CV-Screening，搜索引擎和聊天机器人等明显应用程序；这些应用正在产生歧视性结果。此外，一些研究结果得出结论，有偏见的文本对目标人群群体具有重大影响。例如，男性词的招聘广告往往对女性申请人的吸引力较小。在本文中，我们提出了一种文本样式传输模型，该模型可用于自动进行DEBIAS文本数据。我们的样式转移模型改进了许多现有样式转移技术（例如内容信息丢失）的局限性。我们的模型通过将编码的潜在内容与明确的关键字更换结合来解决此类问题。我们将证明该技术可产生更好的内容保存，同时保持良好的样式转移精度。

It is well known that textual data on the internet and other digital platforms contain significant levels of bias and stereotypes. Although many such texts contain stereotypes and biases that inherently exist in natural language for reasons that are not necessarily malicious, there are crucial reasons to mitigate these biases. For one, these texts are being used as training corpus to train language models for salient applications like cv-screening, search engines, and chatbots; such applications are turning out to produce discriminatory results. Also, several research findings have concluded that biased texts have significant effects on the target demographic groups. For instance, masculine-worded job advertisements tend to be less appealing to female applicants. In this paper, we present a text style transfer model that can be used to automatically debias textual data. Our style transfer model improves on the limitations of many existing style transfer techniques such as loss of content information. Our model solves such issues by combining latent content encoding with explicit keyword replacement. We will show that this technique produces better content preservation whilst maintaining good style transfer accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题