论文标题
cosimgnn:朝向大规模图相似度计算
CoSimGNN: Towards Large-scale Graph Similarity Computation
论文作者
论文摘要
在许多现实世界应用中,基于图表编辑距离(GED)等指标(例如图编辑距离(GED))计算图形之间的相似性得分的能力很重要。计算精确的GED值通常是一个NP硬性问题,传统算法通常在准确性和效率之间实现不令人满意的权衡。最近,图神经网络(GNNS)为此任务提供了数据驱动的解决方案,该解决方案更有效,同时保持小图中的预测准确性(每图约10个节点)相似性计算。现有的基于GNN的方法分别嵌入了两个图(缺乏低级横向相互作用)或用于整个图对(冗余且耗时)的部署跨界相互作用,当图中的节点的数量增加时,仍无法获得竞争性的结果。在本文中,我们着重于大规模图的相似性计算,并提出了“嵌入式磨合匹配”框架cosimgnn,该框架首先嵌入并与自适应合并操作的大图嵌入,然后在污垢的图形上部署细粒度的相互作用,以获得最终的相似性分数。此外,我们创建了几个合成数据集,这些数据集为图形相似性计算提供了新的基准。已经进行了有关合成和现实世界数据集的详细实验,并且Cosimgnn实现了最佳性能,而推理时间最多是以前的ETAM-TEA-THE-THE-ART的1/3。
The ability to compute similarity scores between graphs based on metrics such as Graph Edit Distance (GED) is important in many real-world applications. Computing exact GED values is typically an NP-hard problem and traditional algorithms usually achieve an unsatisfactory trade-off between accuracy and efficiency. Recently, Graph Neural Networks (GNNs) provide a data-driven solution for this task, which is more efficient while maintaining prediction accuracy in small graph (around 10 nodes per graph) similarity computation. Existing GNN-based methods, which either respectively embeds two graphs (lack of low-level cross-graph interactions) or deploy cross-graph interactions for whole graph pairs (redundant and time-consuming), are still not able to achieve competitive results when the number of nodes in graphs increases. In this paper, we focus on similarity computation for large-scale graphs and propose the "embedding-coarsening-matching" framework CoSimGNN, which first embeds and coarsens large graphs with adaptive pooling operation and then deploys fine-grained interactions on the coarsened graphs for final similarity scores. Furthermore, we create several synthetic datasets which provide new benchmarks for graph similarity computation. Detailed experiments on both synthetic and real-world datasets have been conducted and CoSimGNN achieves the best performance while the inference time is at most 1/3 of that of previous state-of-the-art.