具有近似Pagerank的缩放图神经网络

论文标题

具有近似Pagerank的缩放图神经网络

Scaling Graph Neural Networks with Approximate PageRank

论文作者

Bojchevski, Aleksandar, Gasteiger, Johannes, Perozzi, Bryan, Kapoor, Amol, Blais, Martin, Rózemberczki, Benedek, Lukasik, Michal, Günnemann, Stephan

论文摘要

图形神经网络（GNN）已成为解决许多网络挖掘任务的强大方法。但是，在大图上学习仍然是一个挑战 - 许多最近提出的可伸缩GNN方法依赖于昂贵的消息程序来通过图传播信息。我们提出了PPRGO模型，该模型利用了GNN中信息扩散的有效近似，在维持最先进的预测性能的同时，导致了巨大的速度增长。除了更快的速度外，PPRGO本质上是可扩展的，并且可以像在行业环境中发现的大型数据集一样琐碎地平行。我们证明，在许多常用的学术图中，PPRGO在分布式和单次培训环境中的表现都优于基线。为了更好地分析大型图形学习方法的可伸缩性，我们介绍了一个新颖的基准图，具有1240万个节点，1.73亿个边缘和280万个节点特征。我们表明，从头开始训练PPRGO，并预测该图中所有节点的标签，在一台机器上需要不到2分钟，远远超过了同一图上的其他基线。我们讨论了PPRGO在Google上解决大规模节点分类问题的实际应用。

Graph neural networks (GNNs) have emerged as a powerful approach for solving many network mining tasks. However, learning on large graphs remains a challenge - many recently proposed scalable GNN approaches rely on an expensive message-passing procedure to propagate information through the graph. We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs resulting in significant speed gains while maintaining state-of-the-art prediction performance. In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings. We demonstrate that PPRGo outperforms baselines in both distributed and single-machine training environments on a number of commonly used academic graphs. To better analyze the scalability of large-scale graph learning methods, we introduce a novel benchmark graph with 12.4 million nodes, 173 million edges, and 2.8 million node features. We show that training PPRGo from scratch and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph. We discuss the practical application of PPRGo to solve large-scale node classification problems at Google.

下载PDF全文

下载文献需遵守相关版权规定

论文标题