基于机器阅读理解的大规模多细分概念提取

论文标题

基于机器阅读理解的大规模多细分概念提取

Large-scale Multi-granular Concept Extraction Based on Machine Reading Comprehension

论文作者

Yuan, Siyu, Yang, Deqing, Liang, Jiaqing, Sun, Jilun, Huang, Jingyue, Cao, Kaiyan, Xiao, Yanghua, Xie, Rui

论文摘要

知识图（KGS）中的概念使机器能够理解自然语言，从而在许多应用中起着必不可少的作用。但是，现有的公斤对概念的覆盖不良，尤其是细粒度的概念。为了提供现有的KG，我们提出了一个新颖的概念提取框架，即MRC-CE，以从实体的描述性文本中提取大规模的多细分概念。具体而言，MRC-CE是由基于BERT的机器阅读理解模型构建的，该模型可以通过指针网络提取更细粒度的概念。此外，还采用了一个随机的森林和基于规则的修剪来提高MRC-CE的精度并同时回忆。我们的实验对多语言kg的评估，即英国概率和中国CN-DBPEDIA，证明了MRC-CE优于KG完成中最先进的提取模型的优势是合理的。特别是，在为CN-DBPEDIA的每个实体运行MRC-CE之后，将超过7,053,900个新概念（实例关系）提供到KG中。该代码和数据集已在https://github.com/fcihraeipnusnacwh/mrc-ce上发布

The concepts in knowledge graphs (KGs) enable machines to understand natural language, and thus play an indispensable role in many applications. However, existing KGs have the poor coverage of concepts, especially fine-grained concepts. In order to supply existing KGs with more fine-grained and new concepts, we propose a novel concept extraction framework, namely MRC-CE, to extract large-scale multi-granular concepts from the descriptive texts of entities. Specifically, MRC-CE is built with a machine reading comprehension model based on BERT, which can extract more fine-grained concepts with a pointer network. Furthermore, a random forest and rule-based pruning are also adopted to enhance MRC-CE's precision and recall simultaneously. Our experiments evaluated upon multilingual KGs, i.e., English Probase and Chinese CN-DBpedia, justify MRC-CE's superiority over the state-of-the-art extraction models in KG completion. Particularly, after running MRC-CE for each entity in CN-DBpedia, more than 7,053,900 new concepts (instanceOf relations) are supplied into the KG. The code and datasets have been released at https://github.com/fcihraeipnusnacwh/MRC-CE

下载PDF全文

下载文献需遵守相关版权规定

论文标题