论文标题

rdfframes:机器学习工具的知识图访问

RDFFrames: Knowledge Graph Access for Machine Learning Tools

论文作者

Mohamed, Aisha, Abuoda, Ghadeer, Ghanem, Abdurrahman, Kaoudi, Zoi, Aboulnaga, Ashraf

论文摘要

表示为RDF数据集表示的知识图是许多机器学习应用程序不可或缺的。 RDF由数据管理系统和工具的丰富生态系统支持,最著名的是提供SPARQL查询接口的RDF数据库系统。令人惊讶的是,尽管使用数据库系统具有明显的优势,但用于知识图的机器学习工具仍未使用SPARQL。这是由于SPARQL和机器学习工具在数据模型和编程样式方面的不匹配。机器学习工具以表格格式进行数据工作,并使用命令式编程样式进行处理,而SPARQL则是声明性的,并且具有与RDF三元组的基本操作匹配图形模式。我们认为,来自机器学习软件堆栈的知识图的良好接口应使用基于图形遍历的命令,导航编程范式,而不是基于图形模式的SPARQL查询范式。在本文中,我们提出了RDFFRAMES,该框架提供了这样的接口。 rdfframes提供了一种命令性的Python API,该API被内部翻译为SPARQL,并与Pydata Machine Learning Software堆栈集成在一起。 RDFFRAMES使用户能够制作一系列Python调用,以定义从RDF数据库系统中存储的知识图中提取的数据,并将这些调用转换为紧凑的SPQARL查询,在数据库系统上执行它,并以标准表格格式返回结果。因此,RDFFRAME是数据准备的有用工具,将Pydata的可用性与RDF数据库系统的灵活性和性能相结合。

Knowledge graphs represented as RDF datasets are integral to many machine learning applications. RDF is supported by a rich ecosystem of data management systems and tools, most notably RDF database systems that provide a SPARQL query interface. Surprisingly, machine learning tools for knowledge graphs do not use SPARQL, despite the obvious advantages of using a database system. This is due to the mismatch between SPARQL and machine learning tools in terms of data model and programming style. Machine learning tools work on data in tabular format and process it using an imperative programming style, while SPARQL is declarative and has as its basic operation matching graph patterns to RDF triples. We posit that a good interface to knowledge graphs from a machine learning software stack should use an imperative, navigational programming paradigm based on graph traversal rather than the SPARQL query paradigm based on graph patterns. In this paper, we present RDFFrames, a framework that provides such an interface. RDFFrames provides an imperative Python API that gets internally translated to SPARQL, and it is integrated with the PyData machine learning software stack. RDFFrames enables the user to make a sequence of Python calls to define the data to be extracted from a knowledge graph stored in an RDF database system, and it translates these calls into a compact SPQARL query, executes it on the database system, and returns the results in a standard tabular format. Thus, RDFFrames is a useful tool for data preparation that combines the usability of PyData with the flexibility and performance of RDF database systems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源