通过在学术领域中采用自然语言处理和机器学习技术来生成知识图

论文标题

通过在学术领域中采用自然语言处理和机器学习技术来生成知识图

Generating Knowledge Graphs by Employing Natural Language Processing and Machine Learning Techniques within the Scholarly Domain

论文作者

Dessì, Danilo, Osborne, Francesco, Recupero, Diego Reforgiato, Buscaldi, Davide, Motta, Enrico

论文摘要

科学文学的持续增长带来了创新，同时提出了新的挑战。其中之一与以下事实有关：由于需要进行注释和管理的手动努力，其分析变得困难。需要新颖的技术基础设施来帮助研究人员，研究政策制定者和公司浏览，分析和预测科学研究。知识图，即大型实体和关系网络，已被证明是在该领域的有效解决方案。科学知识图的重点是学术领域，通常包含描述研究出版物的元数据，例如作者，场地，组织，研究主题和引用。但是，当前的知识图缺乏对研究论文中介绍的知识的明确表示。因此，在本文中，我们提出了一种新的体系结构，它利用自然语言处理和机器学习方法从研究出版物中提取实体和关系，并将它们整合到大规模的知识图中。在这项研究工作中，我们i）通过采用几种最先进的自然语言处理和文本挖掘工具来应对知识提取的挑战，ii）描述一种用于整合这些工具产生的实体和关系的方法，iii）显示了这种混合系统比替代方法的优势，而不是替代方法，以及VI的vi，以及vi的vi），将semitys摘要在包括109,105 Triples中的摘要中，我们将其摘要，我们在109、105 Tripers Informention Interiantial Crapistion中，我们将其提取到26.105 Triples，我们将其提取。领域。由于我们的方法是一般的，并且可以应用于任何领域，因此我们希望它可以促进科学知识的管理，分析，传播和处理。

The continuous growth of scientific literature brings innovations and, at the same time, raises new challenges. One of them is related to the fact that its analysis has become difficult due to the high volume of published papers for which manual effort for annotations and management is required. Novel technological infrastructures are needed to help researchers, research policy makers, and companies to time-efficiently browse, analyse, and forecast scientific research. Knowledge graphs i.e., large networks of entities and relationships, have proved to be effective solution in this space. Scientific knowledge graphs focus on the scholarly domain and typically contain metadata describing research publications such as authors, venues, organizations, research topics, and citations. However, the current generation of knowledge graphs lacks of an explicit representation of the knowledge presented in the research papers. As such, in this paper, we present a new architecture that takes advantage of Natural Language Processing and Machine Learning methods for extracting entities and relationships from research publications and integrates them in a large-scale knowledge graph. Within this research work, we i) tackle the challenge of knowledge extraction by employing several state-of-the-art Natural Language Processing and Text Mining tools, ii) describe an approach for integrating entities and relationships generated by these tools, iii) show the advantage of such an hybrid system over alternative approaches, and vi) as a chosen use case, we generated a scientific knowledge graph including 109,105 triples, extracted from 26,827 abstracts of papers within the Semantic Web domain. As our approach is general and can be applied to any domain, we expect that it can facilitate the management, analysis, dissemination, and processing of scientific knowledge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题