论文标题
自然语言处理研究中的性别差距:作者身份和引文的差异
Gender Gap in Natural Language Processing Research: Disparities in Authorship and Citations
论文作者
论文摘要
作者身份和性别的引用差异不仅会对弱势性别的性别产生重大的不利后果,而且对整个研究领域也会产生不利的后果。衡量性别差距是解决问题的关键一步。在这项工作中,我们研究了女性第一作者的百分比及其对自然语言处理论文的引用(1965年至2019年)。我们使用现有的手动策划作者 - 性别列表以及与性别密切相关的名字来确定汇总级别的统计信息。我们发现,只有大约29%的第一作者是女性,只有大约25%的最后作者是女性。值得注意的是,自2000年代中期以来,这个百分比一直没有提高。我们还表明,平均而言,即使控制经验和研究领域,女性第一作者也比男性第一作者小。最后,我们讨论自动人口分析中涉及的道德考虑因素。
Disparities in authorship and citations across gender can have substantial adverse consequences not just on the disadvantaged genders, but also on the field of study as a whole. Measuring gender gaps is a crucial step towards addressing them. In this work, we examine female first author percentages and the citations to their papers in Natural Language Processing (1965 to 2019). We determine aggregate-level statistics using an existing manually curated author--gender list as well as first names strongly associated with a gender. We find that only about 29% of first authors are female and only about 25% of last authors are female. Notably, this percentage has not improved since the mid 2000s. We also show that, on average, female first authors are cited less than male first authors, even when controlling for experience and area of research. Finally, we discuss the ethical considerations involved in automatic demographic analysis.