测量和减少预训练模型中的性别相关性

论文标题

测量和减少预训练模型中的性别相关性

Measuring and Reducing Gendered Correlations in Pre-trained Models

论文作者

Webster, Kellie, Wang, Xuezhi, Tenney, Ian, Beutel, Alex, Pitler, Emily, Pavlick, Ellie, Chen, Jilin, Chi, Ed, Petrov, Slav

论文摘要

预训练的模型已彻底改变了自然语言的理解。但是，研究人员发现，他们可以在许多应用中编码不希望的人工制品，例如专业与一种性别相比，而不是另一种性别。我们探讨了这种性别相关性，例如一个案例研究，以解决预训练模型中的意外相关性。我们定义指标，并揭示具有相似精度的模型可以以非常不同的速率编码相关性。我们展示了如何使用通用技术来降低测得的相关性，并突出了不同策略的贸易。通过这些结果，我们提出了培训强大模型的建议：（1）仔细评估意外相关性，（2）注意看似无害的配置差异，（3）专注于一般缓解。

Pre-trained models have revolutionized natural language understanding. However, researchers have found they can encode artifacts undesired in many applications, such as professions correlating with one gender more than another. We explore such gendered correlations as a case study for how to address unintended correlations in pre-trained models. We define metrics and reveal that it is possible for models with similar accuracy to encode correlations at very different rates. We show how measured correlations can be reduced with general-purpose techniques, and highlight the trade offs different strategies have. With these results, we make recommendations for training robust models: (1) carefully evaluate unintended correlations, (2) be mindful of seemingly innocuous configuration differences, and (3) focus on general mitigations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题