通过最大额度分离超平面在单词嵌入中发现语言（ir）规律性

论文标题

通过最大额度分离超平面在单词嵌入中发现语言（ir）规律性

Discovering linguistic (ir)regularities in word embeddings through max-margin separating hyperplanes

论文作者

Kennedy, Noel, Schofield, Imogen, Brodbelt, Dave C., Church, David B., O'Neill, Dan G.

论文摘要

我们尝试学习新方法，以了解相关单词在单词嵌入空间中相对于彼此的定位。以前的方法学习了恒定向量偏移：从源代币到目标令牌的向量，假设这些偏移是彼此平行的。我们表明，相关令牌之间的偏移比平行更接近正交，并且它们的余弦相似性低。我们通过做出不同的假设来进行；目标令牌可与源和未标记的令牌线性分开。我们表明，最大利润超平面可以将目标令牌分开，并且向量与该超平面代表源和目标之间的关系。我们发现，这种关系的这种表示可以在覆盖语言规律的情况下获得最佳结果。我们尝试通过各种算法（Word2Vec：cbow/skip-gram，fastText或手套）训练的向量空间模型，以及各种单词上下文选择，例如线性单词order order order order order order order order order ordatax依赖语法，并且在不了解单词位置的情况下和不知情。这些实验表明，在训练单词嵌入时，我们的模型SVMCOS对一系列实验选择是可靠的。

We experiment with new methods for learning how related words are positioned relative to each other in word embedding spaces. Previous approaches learned constant vector offsets: vectors that point from source tokens to target tokens with an assumption that these offsets were parallel to each other. We show that the offsets between related tokens are closer to orthogonal than parallel, and that they have low cosine similarities. We proceed by making a different assumption; target tokens are linearly separable from source and un-labeled tokens. We show that a max-margin hyperplane can separate target tokens and that vectors orthogonal to this hyperplane represent the relationship between source and targets. We find that this representation of the relationship obtains the best results in dis-covering linguistic regularities. We experiment with vector space models trained by a variety of algorithms (Word2vec: CBOW/skip-gram, fastText, or GloVe), and various word context choices such as linear word-order, syntax dependency grammars, and with and without knowledge of word position. These experiments show that our model, SVMCos, is robust to a range of experimental choices when training word embeddings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题