论文标题
PIPEGCN:具有管道特征通信的图形卷积网络的有效全绘培训
PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication
论文作者
论文摘要
图形卷积网络(GCN)是学习图形结构数据的最新方法,并且训练大规模GCN需要在多个加速器上进行分配训练,以便每个加速器能够持有分区的子图。但是,分布式的GCN训练会在每个训练迭代期间为每个GCN层之间的分区中通信节点特征和特征梯度的刺激性开销,从而限制了可实现的训练效率和模型可扩展性。为此,我们提出了PipeGCN,这是一种简单而有效的方案,该方案通过使用分区内计算进行管道间交流来隐藏开销。对于有效的GCN训练,管道是不平凡的,因为通信的节点功能/梯度将变得陈旧,因此会损害收敛性,否定管道益处。值得注意的是,关于GCN训练的收敛速率和陈旧特征梯度的收敛速率鲜为人知。这项工作不仅提供了理论收敛分析,而且还发现PipeGCN的收敛速率与香草分布的GCN训练的收敛速率没有任何稳定性。此外,我们开发了一种平滑方法来进一步改善PipeGCN的收敛性。广泛的实验表明,PipeGCN可以在很大程度上提高训练吞吐量(1.7倍〜28.5倍),同时达到与其香草对应物和现有的全绘制训练方法相同的准确性。该代码可在https://github.com/rice-eic/pipegcn上找到。
Graph Convolutional Networks (GCNs) is the state-of-the-art method for learning graph-structured data, and training large-scale GCNs requires distributed training across multiple accelerators such that each accelerator is able to hold a partitioned subgraph. However, distributed GCN training incurs prohibitive overhead of communicating node features and feature gradients among partitions for every GCN layer during each training iteration, limiting the achievable training efficiency and model scalability. To this end, we propose PipeGCN, a simple yet effective scheme that hides the communication overhead by pipelining inter-partition communication with intra-partition computation. It is non-trivial to pipeline for efficient GCN training, as communicated node features/gradients will become stale and thus can harm the convergence, negating the pipeline benefit. Notably, little is known regarding the convergence rate of GCN training with both stale features and stale feature gradients. This work not only provides a theoretical convergence analysis but also finds the convergence rate of PipeGCN to be close to that of the vanilla distributed GCN training without any staleness. Furthermore, we develop a smoothing method to further improve PipeGCN's convergence. Extensive experiments show that PipeGCN can largely boost the training throughput (1.7x~28.5x) while achieving the same accuracy as its vanilla counterpart and existing full-graph training methods. The code is available at https://github.com/RICE-EIC/PipeGCN.