论文标题
英语标称复数化的语义属性:单词嵌入的见解
Semantic properties of English nominal pluralization: Insights from word embeddings
论文作者
论文摘要
名义上复数化的语义差异用多种语言进行了语法化。例如,复数标记可能仅与人类名词有关。英语似乎没有这样的区别。使用分布语义,我们表明英语标称复数化表现出语义簇。例如,水果词的多元化彼此更相似,与其他语义类别的多元化相似。因此,将复数形成的含义转移减少到添加抽象复数含义的含义太简单了。引入了一种称为cosclassavg的语义知情方法,它表现出在分布语义中的多元化方法,其假设形成复数形式等于添加固定复数矢量。与我们的方法相比,一种称为FRACSS的组成分布语义的方法,预测的复数向量在方向上与copcus被提取的复数矢量更相似,但不是向量长度。一项建模研究表明,CosClassavg和Fracss的两个预测语义空间之间观察到的差异延续到了听众的计算模型能够理解以前未遇到的复数形式。当cosclassavg生成的语义向量用作金标准矢量而不是fracss生成的矢量时,用Triphone矢量的单词形式的映射到预测的语义向量更有效。
Semantic differentiation of nominal pluralization is grammaticalized in many languages. For example, plural markers may only be relevant for human nouns. English does not appear to make such distinctions. Using distributional semantics, we show that English nominal pluralization exhibits semantic clusters. For instance, pluralization of fruit words is more similar to one another and less similar to pluralization of other semantic classes. Therefore, reduction of the meaning shift in plural formation to the addition of an abstract plural meaning is too simplistic. A semantically informed method, called CosClassAvg, is introduced that outperforms pluralization methods in distributional semantics which assume plural formation amounts to the addition of a fixed plural vector. In comparison with our approach, a method from compositional distributional semantics, called FRACSS, predicted plural vectors that were more similar to the corpus-extracted plural vectors in terms of direction but not vector length. A modeling study reveals that the observed difference between the two predicted semantic spaces by CosClassAvg and FRACSS carries over to how well a computational model of the listener can understand previously unencountered plural forms. Mappings from word forms, represented with triphone vectors, to predicted semantic vectors are more productive when CosClassAvg-generated semantic vectors are employed as gold standard vectors instead of FRACSS-generated vectors.