论文标题

RPD:单词嵌入之间的距离函数

RPD: A Distance Function Between Word Embeddings

论文作者

Zhou, Xuhui, Zheng, Zaixiang, Huang, Shujian

论文摘要

人们理解的是,不同的算法,培训过程和语料库会产生不同的单词嵌入。但是,对于不同的嵌入空间之间的关系,即彼此之间不同的嵌入组集合多远。在本文中,我们提出了一种称为相对成对内部产物距离(RPD)的新型度量,以量化不同单词嵌入组之间的距离。该度量标准具有比较不同单词嵌入集的统一量表。根据RPD的属性,我们系统地研究了不同算法的单词嵌入关系,并研究了不同培训过程和语料库的影响。结果阐明了熟悉的单词嵌入,并证明RPD是嵌入空间距离的量度。

It is well-understood that different algorithms, training processes, and corpora produce different word embeddings. However, less is known about the relation between different embedding spaces, i.e. how far different sets of embeddings deviate from each other. In this paper, we propose a novel metric called Relative pairwise inner Product Distance (RPD) to quantify the distance between different sets of word embeddings. This metric has a unified scale for comparing different sets of word embeddings. Based on the properties of RPD, we study the relations of word embeddings of different algorithms systematically and investigate the influence of different training processes and corpora. The results shed light on the poorly understood word embeddings and justify RPD as a measure of the distance of embedding spaces.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源