面向语言产生的可控偏见

论文标题

面向语言产生的可控偏见

Towards Controllable Biases in Language Generation

论文作者

Sheng, Emily, Chang, Kai-Wei, Natarajan, Premkumar, Peng, Nanyun

论文摘要

我们提出了一种对自然语言产生（NLG）中可控社会偏见的一般方法。在对抗性触发器的想法的基础上，我们开发了一种方法，当输入提示包含特定人群群体时，在生成的文本中诱导社会偏见。然后，我们分析两个情况：1）为一个人口统计学诱导一个人口统计学和正面偏见，以及2）在人口统计学之间均衡偏见。前一种情况使我们能够检测模型中存在的偏见类型。具体而言，我们通过发现与生成文本中人口不平等相对应的主题并比较诱导不同人口统计学的偏见的相对有效性，从而促进偏见分析的方法有效性。第二种情况对于减轻下游应用程序（例如对话生成）的偏见很有用。在我们的实验中，缓解技术被证明可以有效地均衡跨人口统计的偏见量，同时总体上产生了较不偏见的文本。

We present a general approach towards controllable societal biases in natural language generation (NLG). Building upon the idea of adversarial triggers, we develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups. We then analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics. The former scenario enables us to detect the types of biases present in the model. Specifically, we show the effectiveness of our approach at facilitating bias analysis by finding topics that correspond to demographic inequalities in generated text and comparing the relative effectiveness of inducing biases for different demographics. The second scenario is useful for mitigating biases in downstream applications such as dialogue generation. In our experiments, the mitigation technique proves to be effective at equalizing the amount of biases across demographics while simultaneously generating less negatively biased text overall.

下载PDF全文

下载文献需遵守相关版权规定

论文标题