论文标题

Torchgpipe:用于训练巨型模型的即时管道并行性

torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models

论文作者

Kim, Chiheon, Lee, Heungsub, Jeong, Myungryong, Baek, Woonhyuk, Yoon, Boogeon, Kim, Ildoo, Lim, Sungbin, Kim, Sungwoong

论文摘要

我们在Pytorch中设计并实施了一个现成的库,用于使用GPIPE提出的检查点执行微批量管道并行性(Huang等,2019)。特别是,我们开发了一组设计组件,以启用Pytorch的Pipeline-warlal-Run-Run和急切执行环境中的管道并行梯度计算。我们表明,每个组件都是在这种环境中完全受益于管道并行性的必要条件,并通过将其应用于包括Amoebanet-D和U-NET在内的各种网络体系结构中来证明库的效率。我们的库可在https://github.com/kakaobrain/torchgpipe上找到。

We design and implement a ready-to-use library in PyTorch for performing micro-batch pipeline parallelism with checkpointing proposed by GPipe (Huang et al., 2019). In particular, we develop a set of design components to enable pipeline-parallel gradient computation in PyTorch's define-by-run and eager execution environment. We show that each component is necessary to fully benefit from pipeline parallelism in such environment, and demonstrate the efficiency of the library by applying it to various network architectures including AmoebaNet-D and U-Net. Our library is available at https://github.com/kakaobrain/torchgpipe .

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源