使用句法依赖性和神经嵌入的同义词检测

论文标题

使用句法依赖性和神经嵌入的同义词检测

Synonym Detection Using Syntactic Dependency And Neural Embeddings

论文作者

Yang, Dongqiang, Wang, Pikun, Sun, Xiaodong, Li, Ning

论文摘要

矢量空间模型的最新进展显着改善了一些NLP应用，例如神经机器翻译和自然语言的产生。尽管上下文中的单词共发生已广泛用于计数基于基于/预测的分布模型，但句法依赖性在得出分布语义学中的作用尚未得到彻底研究。通过在检测TOEFL中检测同义词的各种矢量空间模型中，我们系统地研究了句法依赖性在解释分布相似性时的显着性。我们根据各种语法角色将句法依赖性分为不同的组，然后使用上下文计数来构建其相应的原始和SVD压缩矩阵。此外，使用相同的培训超参数和语料库，我们研究了评估中的典型神经嵌入。我们进一步研究了将人类编译的语义知识注射到计算分布相似性的神经嵌入中的有效性。我们的结果表明，句法条件的上下文可以比无条件的语言更好地解释词汇语义，而具有语义知识的改造神经嵌入可以显着改善同义词检测。

Recent advances on the Vector Space Model have significantly improved some NLP applications such as neural machine translation and natural language generation. Although word co-occurrences in context have been widely used in counting-/predicting-based distributional models, the role of syntactic dependencies in deriving distributional semantics has not yet been thoroughly investigated. By comparing various Vector Space Models in detecting synonyms in TOEFL, we systematically study the salience of syntactic dependencies in accounting for distributional similarity. We separate syntactic dependencies into different groups according to their various grammatical roles and then use context-counting to construct their corresponding raw and SVD-compressed matrices. Moreover, using the same training hyperparameters and corpora, we study typical neural embeddings in the evaluation. We further study the effectiveness of injecting human-compiled semantic knowledge into neural embeddings on computing distributional similarity. Our results show that the syntactically conditioned contexts can interpret lexical semantics better than the unconditioned ones, whereas retrofitting neural embeddings with semantic knowledge can significantly improve synonym detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题