一项关于图神经网络分布式培训的综合调查

论文标题

一项关于图神经网络分布式培训的综合调查

A Comprehensive Survey on Distributed Training of Graph Neural Networks

论文作者

Lin, Haiyang, Yan, Mingyu, Ye, Xiaochun, Fan, Dongrui, Pan, Shirui, Chen, Wenguang, Xie, Yuan

论文摘要

图形神经网络（GNN）已被证明是广泛应用领域中强大的算法模型，用于对图形学习的有效性。为了扩展GNN培训以进行大规模和不断增长的图表，最有前途的解决方案是分布式培训，该培训分布在多个计算节点上分配培训的工作量。目前，有关分布式GNN培训的相关研究的数量异常广泛，伴随着非常迅速的出版物。此外，这些研究中报道的方法表现出明显的差异。这种情况给新移民带来了巨大的挑战，阻碍了他们对分布式GNN培训中使用的工作流，计算模式，通信策略和优化技术的全面理解的能力。结果，需要进行调查以提供正确的识别，分析和比较。在本文中，我们通过调查分布式GNN培训中使用的各种优化技术，对分布式GNN培训进行了全面的调查。首先，根据其工作流程将分布式GNN培训分为几类。此外，还介绍了它们的计算模式和通信模式，以及最近工作提出的优化技术。其次，还引入了分布式GNN培训的软件框架和硬件平台，以深入了解。第三，将分布式GNN培训与深层神经网络的分布式培训进行了比较，从而强调了分布式GNN培训的独特性。最后，讨论了该领域有趣的问题和机会。

Graph neural networks (GNNs) have been demonstrated to be a powerful algorithmic model in broad application fields for their effectiveness in learning over graphs. To scale GNN training up for large-scale and ever-growing graphs, the most promising solution is distributed training which distributes the workload of training across multiple computing nodes. At present, the volume of related research on distributed GNN training is exceptionally vast, accompanied by an extraordinarily rapid pace of publication. Moreover, the approaches reported in these studies exhibit significant divergence. This situation poses a considerable challenge for newcomers, hindering their ability to grasp a comprehensive understanding of the workflows, computational patterns, communication strategies, and optimization techniques employed in distributed GNN training. As a result, there is a pressing need for a survey to provide correct recognition, analysis, and comparisons in this field. In this paper, we provide a comprehensive survey of distributed GNN training by investigating various optimization techniques used in distributed GNN training. First, distributed GNN training is classified into several categories according to their workflows. In addition, their computational patterns and communication patterns, as well as the optimization techniques proposed by recent work are introduced. Second, the software frameworks and hardware platforms of distributed GNN training are also introduced for a deeper understanding. Third, distributed GNN training is compared with distributed training of deep neural networks, emphasizing the uniqueness of distributed GNN training. Finally, interesting issues and opportunities in this field are discussed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题