论文标题
来自老人观点的论点:评估论证中的社会偏见
Argument from Old Man's View: Assessing Social Bias in Argumentation
论文作者
论文摘要
语言的社会偏见 - 对性别,种族,年龄和其他社会群体 - 对许多NLP应用带来了道德影响的问题。最近的研究表明,对各自数据培训的机器学习模型可能不仅采用,而且可以扩大偏见。然而,到目前为止,对计算论证的偏见很少。在本文中,我们研究了大型英语辩论门户网站中社会偏见的存在。特别是,我们在特定于门户的语料库上训练单词嵌入模型,并使用WEAT(一种现有的指标)系统地评估其偏见,以测量单词嵌入中的偏见。在单词同时分析中,我们然后研究了偏差的原因。结果表明,所有经过测试的辩论语料库都包含不平衡和有偏见的数据,其中大部分有利于具有欧美名字的男性。我们的经验见解有助于理解论证数据源中的偏见。
Social bias in language - towards genders, ethnicities, ages, and other social groups - poses a problem with ethical impact for many NLP applications. Recent research has shown that machine learning models trained on respective data may not only adopt, but even amplify the bias. So far, however, little attention has been paid to bias in computational argumentation. In this paper, we study the existence of social biases in large English debate portals. In particular, we train word embedding models on portal-specific corpora and systematically evaluate their bias using WEAT, an existing metric to measure bias in word embeddings. In a word co-occurrence analysis, we then investigate causes of bias. The results suggest that all tested debate corpora contain unbalanced and biased data, mostly in favor of male people with European-American names. Our empirical insights contribute towards an understanding of bias in argumentative data sources.