具有非线性几何形状的单词嵌入

论文标题

具有非线性几何形状的单词嵌入

Debiasing Word Embeddings with Nonlinear Geometry

论文作者

Cheng, Lu, Kim, Nayoung, Liu, Huan

论文摘要

词性嵌入在很大程度上仅限于个人和独立的社会类别。但是，现实世界中的语料库通常提出可能相互关联或相交的多个社会类别。例如，“头发编织”与非洲裔美国女性刻板印象相关，但非裔美国人也不是女性。因此，这项工作研究与多个社会类别相关的偏见：由不同类别和交叉偏见的联合引起的联合偏见，这些偏见与组成类别的偏见没有重叠。我们首先从经验上观察到，单个偏见是非平凡的（即在一维子空间上）的。从社会科学和语言理论中的交叉理论中，我们使用单个偏见的非线性几何形状为多个社会类别构建了一个相交子空间。经验评估证实了我们方法的功效。数据和实现代码可以在https://github.com/githublucheng/implementation-of-josec-coling-22下载。

Debiasing word embeddings has been largely limited to individual and independent social categories. However, real-world corpora typically present multiple social categories that possibly correlate or intersect with each other. For instance, "hair weaves" is stereotypically associated with African American females, but neither African American nor females alone. Therefore, this work studies biases associated with multiple social categories: joint biases induced by the union of different categories and intersectional biases that do not overlap with the biases of the constituent categories. We first empirically observe that individual biases intersect non-trivially (i.e., over a one-dimensional subspace). Drawing from the intersectional theory in social science and the linguistic theory, we then construct an intersectional subspace to debias for multiple social categories using the nonlinear geometry of individual biases. Empirical evaluations corroborate the efficacy of our approach. Data and implementation code can be downloaded at https://github.com/GitHubLuCheng/Implementation-of-JoSEC-COLING-22.

下载PDF全文

下载文献需遵守相关版权规定

论文标题