极性框架：极性相反，可以解释预训练的单词嵌入

论文标题

极性框架：极性相反，可以解释预训练的单词嵌入

The POLAR Framework: Polar Opposites Enable Interpretability of Pre-Trained Word Embeddings

论文作者

Mathew, Binny, Sikdar, Sandipan, Lemmerich, Florian, Strohmaier, Markus

论文摘要

我们介绍了Polar-框架，通过采用语义差异来为预训练的单词嵌入增添解释性。语义差异是一种心理测量结构，用于通过在两个极性对立（例如冷 - 热，软 - 硬）之间分析其位置，以测量单词的语义。我们方法的核心思想是通过语义差异将现有的，预训练的单词嵌入转换为一个新的“极地”空间，其具有可解释的维度，该空间由这种极地对立定义。我们的框架还允许从Oracle提供的一组极性维度（即外部源）中选择最歧视的维度。我们通过将其部署到各种下游任务中来证明我们的框架的有效性，在该任务中，我们的可解释的单词嵌入实现了与原始单词嵌入的性能。我们还表明，我们框架选择的可解释维度与人类的判断保持一致。总之，这些结果表明可以将可解释性添加到单词嵌入中，而不会损害性能。我们的工作与有兴趣解释预训练单词嵌入的研究人员和工程师有关。

We introduce POLAR - a framework that adds interpretability to pre-trained word embeddings via the adoption of semantic differentials. Semantic differentials are a psychometric construct for measuring the semantics of a word by analysing its position on a scale between two polar opposites (e.g., cold -- hot, soft -- hard). The core idea of our approach is to transform existing, pre-trained word embeddings via semantic differentials to a new "polar" space with interpretable dimensions defined by such polar opposites. Our framework also allows for selecting the most discriminative dimensions from a set of polar dimensions provided by an oracle, i.e., an external source. We demonstrate the effectiveness of our framework by deploying it to various downstream tasks, in which our interpretable word embeddings achieve a performance that is comparable to the original word embeddings. We also show that the interpretable dimensions selected by our framework align with human judgement. Together, these results demonstrate that interpretability can be added to word embeddings without compromising performance. Our work is relevant for researchers and engineers interested in interpreting pre-trained word embeddings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题