论文标题
关于单词类比测试中预测的信息内容
On the Information Content of Predictions in Word Analogy Tests
论文作者
论文摘要
提出了一种方法来量化类比测试中类比的实际相关性。该方法的主要组成部分是软性估计器,该估计量还具有补偿偏见的熵估计值。从信息内容的角度来看,用预训练的手套300-D向量和两个公共类比测试集获得的实验结果表明,在类比测试中,接近性提示比类比更相关。因此,一个简单的单词嵌入模型用于预测类比携带的信息大约是一个信息,这是通过实验证实的。
An approach is proposed to quantify, in bits of information, the actual relevance of analogies in analogy tests. The main component of this approach is a softaccuracy estimator that also yields entropy estimates with compensated biases. Experimental results obtained with pre-trained GloVe 300-D vectors and two public analogy test sets show that proximity hints are much more relevant than analogies in analogy tests, from an information content perspective. Accordingly, a simple word embedding model is used to predict that analogies carry about one bit of information, which is experimentally corroborated.