论文标题
Dragoman:通过框架有效评估声明的映射语言,用于知识图创建
Dragoman: Efficiently Evaluating Declarative Mapping Languages over Frameworks for Knowledge Graph Creation
论文作者
论文摘要
近年来,为了使RDF知识图创建过程可追溯和透明的过程做出了宝贵的努力和贡献。扩展和应用声明映射语言就是一个示例。一个具有挑战性的步骤是旨在克服互操作性问题的程序的可追溯性,也就是数据级集成。在大多数管道中,数据集成是由临时程序执行的,从而防止了可追溯性和可重复性。但是,由基于功能的声明映射语言(例如funul和rml+fno授权表达力)提供的正式框架。数据级集成可以定义为函数,并作为执行模式级集成的映射的一部分集成。但是,将功能与映射结合起来引入了一种新的复杂性来源,该来源可能会大大影响所需的资源和执行时间。我们解决了通过功能有效执行映射的问题,并将其正式转换为无功能映射。这些转换是一个优化过程的基础,旨在对基于功能的映射规则进行渴望评估。这些技术是在名为Dragoman的框架中实现的。我们证明了转换的正确性,同时确保了无函数数据集成过程等效于原始过程。 Dragoman的有效性在230个测试床上进行了经验评估,这些测试床由与不同复杂性的映射规则集成的各种功能组成。结果表明,评估无函数映射规则会减少由大型数据源和多种类型的映射规则组成的复杂知识创建管道中的执行时间。节省的最高可达75%,这表明在映射规则中急切地执行功能使这些管道能够在现实世界设置中适用和可扩展。
In recent years, there have been valuable efforts and contributions to make the process of RDF knowledge graph creation traceable and transparent; extending and applying declarative mapping languages is an example. One challenging step is the traceability of procedures that aim to overcome interoperability issues, a.k.a. data-level integration. In most pipelines, data integration is performed by ad-hoc programs, preventing traceability and reusability. However, formal frameworks provided by function-based declarative mapping languages such as FunUL and RML+FnO empower expressiveness. Data-level integration can be defined as functions and integrated as part of the mappings performing schema-level integration. However, combining functions with the mappings introduces a new source of complexity that can considerably impact the required number of resources and execution time. We tackle the problem of efficiently executing mappings with functions and formalize the transformation of them into function-free mappings. These transformations are the basis of an optimization process that aims to perform an eager evaluation of function-based mapping rules. These techniques are implemented in a framework named Dragoman. We demonstrate the correctness of the transformations while ensuring that the function-free data integration processes are equivalent to the original one. The effectiveness of Dragoman is empirically evaluated in 230 testbeds composed of various types of functions integrated with mapping rules of different complexity. The outcomes suggest that evaluating function-free mapping rules reduces execution time in complex knowledge graph creation pipelines composed of large data sources and multiple types of mapping rules. The savings can be up to 75%, suggesting that eagerly executing functions in mapping rules enable making these pipelines applicable and scalable in real-world settings.