捕获单词相似性的单词嵌入的比较分析

论文标题

捕获单词相似性的单词嵌入的比较分析

Comparative Analysis of Word Embeddings for Capturing Word Similarities

论文作者

Toshevska, Martina, Stojanovska, Frosina, Kalajdjieski, Jovan

论文摘要

分布式语言表示已成为各种自然语言处理任务中语言表示最广泛使用的技术。基于深度学习技术的大多数自然语言处理模型都使用已经预先训练的单词表示形式，通常称为单词嵌入。确定最定性的单词嵌入对于此类模型至关重要。但是，选择适当的单词嵌入是一项困惑的任务，因为投影的嵌入空间对人类并不直观。在本文中，我们探讨了创建分布式单词表示形式的不同方法。我们对几种最先进的单词嵌入方法进行内在评估。它们在捕获单词相似性方面的性能与现有基准数据集分析了单词对相似性。本文中的研究在基础真理单词相似性与通过不同单词嵌入方法获得的相似性之间进行了相关分析。

Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans. In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题