反事实公平探测的灵活文本生成

论文标题

反事实公平探测的灵活文本生成

Flexible text generation for counterfactual fairness probing

论文作者

Fryer, Zee, Axelrod, Vera, Packer, Ben, Beutel, Alex, Chen, Jilin, Webster, Kellie

论文摘要

在基于文本的分类器中测试公平性问题的一种常见方法是通过使用反事实来：如果更改了输入中的敏感属性，则分类器输出是否会更改？现有的反事实生成方法通常依赖于单词列表或模板，产生不考虑语法，上下文或微妙敏感属性引用的简单反事实，并且可能会错过WordList创建者未考虑的问题。在本文中，我们介绍了一项为克服这些缺点而产生的反事实的任务，并证明可以利用大型语言模型（LLM）来在此任务上取得进展。我们表明，这种基于LLM的方法可以产生现有方法无法进行的复杂反事实，从而比较了民事评论数据集中各种反事实生成方法的性能，并在评估毒性分类器时显示出其价值。

A common approach for testing fairness issues in text-based classifiers is through the use of counterfactuals: does the classifier output change if a sensitive attribute in the input is changed? Existing counterfactual generation methods typically rely on wordlists or templates, producing simple counterfactuals that don't take into account grammar, context, or subtle sensitive attribute references, and could miss issues that the wordlist creators had not considered. In this paper, we introduce a task for generating counterfactuals that overcomes these shortcomings, and demonstrate how large language models (LLMs) can be leveraged to make progress on this task. We show that this LLM-based method can produce complex counterfactuals that existing methods cannot, comparing the performance of various counterfactual generation methods on the Civil Comments dataset and showing their value in evaluating a toxicity classifier.

下载PDF全文

下载文献需遵守相关版权规定

论文标题