基于多模型细粒非线性融合的语义相似性计算模型

论文标题

基于多模型细粒非线性融合的语义相似性计算模型

Semantic Similarity Computing Model Based on Multi Model Fine-Grained Nonlinear Fusion

论文作者

Zhang, Peiying, Huang, Xingzhe, Wang, Yaqi, Jiang, Chunxiao, He, Shuqing, Wang, Haifeng

论文摘要

自然语言处理（NLP）任务在许多领域都取得了出色的表现，包括语义理解，自动摘要，图像识别等。但是，大多数用于NLP的神经网络模型以细粒度的方式提取文本，这不利于从全球角度掌握文本的含义。为了减轻问题，本文提出了传统统计方法和深度学习模型的组合以及基于多模型非线性融合的新型模型。该模型基于语音，术语频率式文档频率（TF-IDF）和Word2Vec-CNN算法的一部分使用JACCARD系数分别测量句子的相似性。根据每个模型的计算精度，获得了归一化的重量系数，并比较了计算结果。加权向量输入到完全连接的神经网络中，以给出最终的分类结果。结果，统计句子相似性评估算法降低了特征提取的粒度，因此它可以在全球范围内掌握句子特征。实验结果表明，基于多模型非线性融合的句子相似性计算方法的匹配为84％，模型的F1值为75％。

Natural language processing (NLP) task has achieved excellent performance in many fields, including semantic understanding, automatic summarization, image recognition and so on. However, most of the neural network models for NLP extract the text in a fine-grained way, which is not conducive to grasp the meaning of the text from a global perspective. To alleviate the problem, the combination of the traditional statistical method and deep learning model as well as a novel model based on multi model nonlinear fusion are proposed in this paper. The model uses the Jaccard coefficient based on part of speech, Term Frequency-Inverse Document Frequency (TF-IDF) and word2vec-CNN algorithm to measure the similarity of sentences respectively. According to the calculation accuracy of each model, the normalized weight coefficient is obtained and the calculation results are compared. The weighted vector is input into the fully connected neural network to give the final classification results. As a result, the statistical sentence similarity evaluation algorithm reduces the granularity of feature extraction, so it can grasp the sentence features globally. Experimental results show that the matching of sentence similarity calculation method based on multi model nonlinear fusion is 84%, and the F1 value of the model is 75%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题