时间属性图的分布式路径查询引擎

论文标题

时间属性图的分布式路径查询引擎

A Distributed Path Query Engine for Temporal Property Graphs

论文作者

Ramesh, Shriram, Baranawal, Animesh, Simmhan, Yogesh

论文摘要

属性图是链接数据的常见形式，其路径查询用于遍历并探索它们进行企业交易和采矿。时间属性图是最近的变体，其中时间是要查询的一流实体，其属性和结构随时间而变化。这些在社会，电信，过境和流行病网络中可以看到。但是，当前的图形数据库和查询引擎对图形实体之间的时间关系的支持有限，对时变实体的支持和/或不扩展分布式资源。我们通过将线性路径查询模型扩展到属性图中来解决此差距，以在时间图上包括直觉的时间谓词和聚合操作员。我们使用以间隔计算模型为这些时间路径查询设计了分布式执行模型，并开发了一种新颖的成本模型，以从几个中选择有效的执行计划。我们使用静态和动态的时间属性图对花岗岩分布式查询引擎进行详细实验，该图形较大至52m的顶点，218m边缘和32.5m属性，以及从LDBC基准测试得出的1600次质量工作负载。我们经常在商品群集上提供次秒的查询潜伏期，与行业领先的NEO4J共享内存数据库和Janusgraph / Spark分布式图查询引擎相比，该群集的速度快149x-1140x。花岗岩还完成了所有图表的100％查询，而基线系统的工作负载仅为32-92％。此外，我们的成本模型选择了一个查询计划，该计划在90％的情况下的最佳执行时间的10％以内。尽管图形处理的性质不规则，但对于大多数查询工作负载，我们在8个节点上表现出弱尺度效率> = 60％，在16个节点上> = 40％。

Property graphs are a common form of linked data, with path queries used to traverse and explore them for enterprise transactions and mining. Temporal property graphs are a recent variant where time is a first-class entity to be queried over, and their properties and structure vary over time. These are seen in social, telecom, transit and epidemic networks. However, current graph databases and query engines have limited support for temporal relations among graph entities, no support for time-varying entities and/or do not scale on distributed resources. We address this gap by extending a linear path query model over property graphs to include intuitive temporal predicates and aggregation operators over temporal graphs. We design a distributed execution model for these temporal path queries using the interval-centric computing model, and develop a novel cost model to select an efficient execution plan from several. We perform detailed experiments of our Granite distributed query engine using both static and dynamic temporal property graphs as large as 52M vertices, 218M edges and 325M properties, and a 1600-query workload, derived from the LDBC benchmark. We often offer sub-second query latencies on a commodity cluster, which is 149x-1140x faster compared to industry-leading Neo4J shared-memory graph database and the JanusGraph / Spark distributed graph query engine. Granite also completes 100% of the queries for all graphs, compared to only 32-92% workload completion by the baseline systems. Further, our cost model selects a query plan that is within 10% of the optimal execution time in 90% of the cases. Despite the irregular nature of graph processing, we exhibit a weak-scaling efficiency >= 60% on 8 nodes and >= 40% on 16 nodes, for most query workloads.

下载PDF全文

下载文献需遵守相关版权规定

论文标题