回到未来：关于NLP的潜在历史

论文标题

回到未来：关于NLP的潜在历史

Back to the Future: On Potential Histories in NLP

论文作者

Talat, Zeerak, Lauscher, Anne

论文摘要

机器学习和NLP需要构建数据集以训练和微调模型。在这种情况下，以前的工作证明了这些数据集的灵敏度。例如，这些数据中的潜在社会偏见可能会在我们部署的模型中进行编码并放大。在这项工作中，我们借鉴了历史领域的发展，并就这些问题进行了新的视角：通过历史小说的视角考虑数据集和模型，并提供了重新配置我们如何看待过去的过去，以使边缘化的论述浮出水面。在这种见解的基础上，我们认为当代机器学习方法对统治和霸权历史有偏见。我们以Neopronouns的例子为例，表明，通过在当代条件下浮出水面的历史，我们可以创建模型，以更好地代表传统上边缘化和被排除社区的现实现实。

Machine learning and NLP require the construction of datasets to train and fine-tune models. In this context, previous work has demonstrated the sensitivity of these data sets. For instance, potential societal biases in this data are likely to be encoded and to be amplified in the models we deploy. In this work, we draw from developments in the field of history and take a novel perspective on these problems: considering datasets and models through the lens of historical fiction surfaces their political nature, and affords re-configuring how we view the past, such that marginalized discourses are surfaced. Building on such insights, we argue that contemporary methods for machine learning are prejudiced towards dominant and hegemonic histories. Employing the example of neopronouns, we show that by surfacing marginalized histories within contemporary conditions, we can create models that better represent the lived realities of traditionally marginalized and excluded communities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题