Banyan：用于图形查询服务的范围数据流引擎

论文标题

Banyan：用于图形查询服务的范围数据流引擎

Banyan: A Scoped Dataflow Engine for Graph Query Service

论文作者

Su, Li, Qin, Xiaoming, Zhang, Zichao, Yang, Rui, Xu, Le, Gupta, Indranil, Yu, Wenyuan, Zeng, Kai, Zhou, Jingren

论文摘要

如今，图形查询服务（GQS）被广泛用于大规模图数据上的交互式回答图形遍历查询。现有的图形查询引擎主要集中于优化单个查询的延迟。这忽略了GQ所带来的重大挑战，包括在查询执行过程中的细粒度控制和调度，以及从用户到内部质量的各个级别的性能隔离和负载平衡。为了应对这些控制和调度挑战，我们提出了一个新颖的范围数据流，用于建模图形遍历查询，该查询明确地揭示了对任何子查询的同时执行和控制到最优质的粒度。我们实施了基于GQS的范围数据流模型的引擎Banyan。 Banyan专注于扩大单台机器上的性能，并提供了轻松扩展的能力。多个基准测试的广泛实验表明，榕树在最先进的图形查询引擎中最多提高了三个数量级的性能，同时提供了性能隔离和负载平衡。

Graph query services (GQS) are widely used today to interactively answer graph traversal queries on large-scale graph data. Existing graph query engines focus largely on optimizing the latency of a single query. This ignores significant challenges posed by GQS, including fine-grained control and scheduling during query execution, as well as performance isolation and load balancing in various levels from across user to intra-query. To tackle these control and scheduling challenges, we propose a novel scoped dataflow for modeling graph traversal queries, which explicitly exposes concurrent execution and control of any subquery to the finest granularity. We implemented Banyan, an engine based on the scoped dataflow model for GQS. Banyan focuses on scaling up the performance on a single machine, and provides the ability to easily scale out. Extensive experiments on multiple benchmarks show that Banyan improves performance by up to three orders of magnitude over state-of-the-art graph query engines, while providing performance isolation and load balancing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题