论文标题

SDM-RDFizer:用于有效创建RDF知识图的RML解释器

SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs

论文作者

Iglesias, Enrique, Jozashoori, Samaneh, Chaves-Fraga, David, Collarana, Diego, Vidal, Maria-Esther

论文摘要

近年来,数据量呈指数增加,并且知识图随着数据结构的关注而引起了人们的关注,以整合从无数数据源中收集的数据和知识。但是,数据复杂性问题(例如大容量,高功能率和异质性)通常会表征这些数据源,因此需要数据管理工具能够对这些问题对知识图创建过程产生负面影响。在本文中,我们提出了RDF映射语言(RML)的SDM-RDFizer,以将各种格式的原始数据转换为RDF知识图。 SDM-RDFizer实现了新颖的算法,以在RML中的映射之间执行逻辑运算符,从而使数据扩展到数据不仅宽,而且具有高删除率的复杂场景。我们对具有不同的数据量,重复项和异质性配置的不同测试台进行了经验评估SDM-RDFizer性能。观察到的结果表明,SDM-RDFizer的两个数量级要比艺术的状态快两个阶,这意味着SDM-RDFizer是知识图创建的可互操作和可扩展的解决方案。 SDM-RDFizer通过GitHub存储库和DOI公开作为资源可用。

In recent years, the amount of data has increased exponentially, and knowledge graphs have gained attention as data structures to integrate data and knowledge harvested from myriad data sources. However, data complexity issues like large volume, high-duplicate rate, and heterogeneity usually characterize these data sources, being required data management tools able to address the impact negatively of these issues on the knowledge graph creation process. In this paper, we propose the SDM-RDFizer, an interpreter of the RDF Mapping Language (RML), to transform raw data in various formats into an RDF knowledge graph. SDM-RDFizer implements novel algorithms to execute the logical operators between mappings in RML, allowing thus to scale up to complex scenarios where data is not only broad but has a high-duplication rate. We empirically evaluate the SDM-RDFizer performance against diverse testbeds with diverse configurations of data volume, duplicates, and heterogeneity. The observed results indicate that SDM-RDFizer is two orders of magnitude faster than state of the art, thus, meaning that SDM-RDFizer an interoperable and scalable solution for knowledge graph creation. SDM-RDFizer is publicly available as a resource through a Github repository and a DOI.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源