论文标题
双曲线空间中的跨语性单词嵌入
Cross-lingual Word Embeddings in Hyperbolic Space
论文作者
论文摘要
跨语言嵌入可以应用于跨多种语言的几种自然语言处理应用程序。与先前使用基于欧几里得空间的单词嵌入的作品不同,这篇简短的论文介绍了一种简单有效的跨语言2VEC模型,该模型适应了双曲线空间的Poincaré球模型,以从德国 - 英语平行语料库中学习无处理的交叉语言表示。已经表明,双曲线嵌入可以捕获和保留分层关系。我们在高呼气和类比任务上评估了模型。所提出的模型在跨语性类比任务上使用Vanilla Word2VEC模型实现了可比性的性能,Hypernymy任务表明,跨语义的PoincaréWord2Vec模型可以从跨语言中捕获潜在的层次结构,而这是从基于Euclidean的Word2Vec表示中却不来自欧几里得的。我们的结果表明,通过保留潜在的分层信息,双曲线空间可以为跨语性嵌入提供更好的表示。
Cross-lingual word embeddings can be applied to several natural language processing applications across multiple languages. Unlike prior works that use word embeddings based on the Euclidean space, this short paper presents a simple and effective cross-lingual Word2Vec model that adapts to the Poincaré ball model of hyperbolic space to learn unsupervised cross-lingual word representations from a German-English parallel corpus. It has been shown that hyperbolic embeddings can capture and preserve hierarchical relationships. We evaluate the model on both hypernymy and analogy tasks. The proposed model achieves comparable performance with the vanilla Word2Vec model on the cross-lingual analogy task, the hypernymy task shows that the cross-lingual Poincaré Word2Vec model can capture latent hierarchical structure from free text across languages, which are absent from the Euclidean-based Word2Vec representations. Our results show that by preserving the latent hierarchical information, hyperbolic spaces can offer better representations for cross-lingual embeddings.