论文标题

基于变压器和技术单词信息的知识产权实体识别方法

An Intellectual Property Entity Recognition Method Based on Transformer and Technological Word Information

论文作者

Wang, Yuhui, Du, Junping, Shao, Yingxia

论文摘要

专利文本包含大量实体信息。通过指定的实体识别,可以从中提取包含关键信息的知识产权实体信息,从而帮助研究人员更快地了解专利内容。因此,现有命名实体提取方法很难在专业词汇更改带来的单词级别上充分利用语义信息。本文提出了一种基于变压器和技术单词信息提取知识产权实体的方法,并与BERT语言方法结合使用准确的单词向量表示。在单词矢量生成过程中,添加了IDCNN提取的技术单词信息,以提高对知识产权实体表示能力的理解。最后,引入相对位置编码的变压器编码器用于从单词向量的顺序中学习文本的深度语义信息,并实现实体标签预测。公共数据集和注释专利数据集的实验结果表明,该方法提高了实体识别的准确性。

Patent texts contain a large amount of entity information. Through named entity recognition, intellectual property entity information containing key information can be extracted from it, helping researchers to understand the patent content faster. Therefore, it is difficult for existing named entity extraction methods to make full use of the semantic information at the word level brought about by professional vocabulary changes. This paper proposes a method for extracting intellectual property entities based on Transformer and technical word information , and provides accurate word vector representation in combination with the BERT language method. In the process of word vector generation, the technical word information extracted by IDCNN is added to improve the understanding of intellectual property entities Representation ability. Finally, the Transformer encoder that introduces relative position encoding is used to learn the deep semantic information of the text from the sequence of word vectors, and realize entity label prediction. Experimental results on public datasets and annotated patent datasets show that the method improves the accuracy of entity recognition.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源