论文标题
连续子图匹配的深入研究(完整版本)
An In-Depth Study of Continuous Subgraph Matching (Complete Version)
论文作者
论文摘要
连续子图匹配(CSM)算法在线上找到给定模式的出现。已经提出了许多增量CSM算法。但是,缺少对这些算法的系统研究,以确定其在广泛的工作量上的优势和缺点。因此,我们首先建议将CSM建模为增量视图维护(IVM),以捕获现有算法的设计空间。然后,我们在基于IVM的共同框架中实现了六种代表性CSM算法,包括Incisomatch,SJ-Tree,GraphFlow,Iedyn,TurboFlux和Symbi。我们进一步进行了广泛的实验,以评估竞争算法的整体性能,并研究各个技术的有效性,以查明导致性能差异的关键因素。我们可以获得以下对性能的新见解:(1)现有算法从查询图中的边缘启动搜索,该算法映射到更新的数据边缘,可能导致许多无效的部分结果; (2)所有匹配顺序均基于简单的启发式方法,有时似乎无效; (3)索引更新在某些查询上主导查询时间; (4)具有持续延迟枚举的算法具有显着的索引更新成本。因此,在所有情况下,没有任何算法主导其他算法。因此,我们根据实验结果给出了一些建议。特别是,SYMBI索引对于稀疏查询或长期运行的查询很有用。 Iededn和TurboFlux的匹配顺序在树查询,密集查询上的图形流量或查询和数据图都稀疏时效果很好,否则,我们建议Symbi的匹配订单。
Continuous subgraph matching (CSM) algorithms find the occurrences of a given pattern on a stream of data graphs online. A number of incremental CSM algorithms have been proposed. However, a systematical study on these algorithms is missing to identify their advantages and disadvantages on a wide range of workloads. Therefore, we first propose to model CSM as incremental view maintenance (IVM) to capture the design space of existing algorithms. Then, we implement six representative CSM algorithms, including IncIsoMatch, SJ-Tree, Graphflow, IEDyn, TurboFlux, and SymBi, in a common framework based on IVM. We further conduct extensive experiments to evaluate the overall performance of competing algorithms as well as study the effectiveness of individual techniques to pinpoint the key factors leading to the performance differences. We obtain the following new insights into the performance: (1) existing algorithms start the search from an edge in the query graph that maps to an updated data edge, potentially leading to many invalid partial results; (2) all matching orders are based on simple heuristics, which appear ineffective at times; (3) index updates dominate the query time on some queries; and (4) the algorithm with constant delay enumeration bears significant index update cost. Consequently, no algorithm dominates the others in all cases. Therefore, we give a few recommendations based on our experiment results. In particular, the SymBi index is useful for sparse queries or long running queries. The matching orders of IEDyn and TurboFlux work well on tree queries, those of Graphflow on dense queries or when both query and data graphs are sparse, and otherwise, we recommend SymBi's matching orders.