用于训练记忆约束DNN的计算图形分区方法

论文标题

用于训练记忆约束DNN的计算图形分区方法

A Computational-Graph Partitioning Method for Training Memory-Constrained DNNs

论文作者

Qararyah, Fareed, Wahib, Mohamed, Dikbayır, Doğa, Belviranli, Mehmet Esat, Unat, Didem

论文摘要

许多最先进的深神经网络（DNN）都有很大的记忆要求。训练这些型号时，有限的设备内存成为瓶颈。我们提出了代表为计算图的DNN的自动，通用和非侵入性分区策略。 Pardnn决定将DNN的基础计算图操作放置在多个设备上，以便满足设备的内存约束，并将训练时间最小化。 Pardnn完全独立于DNN的深度学习方面。它不需要在模型和系统级别的操作内核实施中进行修改。 Pardnn分区DNN在几秒钟到几分钟内有数十亿个参数和数十万个操作。我们在16个GPU上进行张量的实验表明，对5个非常大型模型进行了有效的训练，同时为批处理大小和训练吞吐量实现了超线性缩放。 Pardnn在相关工作中要么胜过或定性提高。

Many state-of-the-art Deep Neural Networks (DNNs) have substantial memory requirements. Limited device memory becomes a bottleneck when training those models. We propose ParDNN, an automatic, generic, and non-intrusive partitioning strategy for DNNs that are represented as computational graphs. ParDNN decides a placement of DNN's underlying computational graph operations across multiple devices so that the devices' memory constraints are met and the training time is minimized. ParDNN is completely independent of the deep learning aspects of a DNN. It requires no modification neither at the model nor at the systems level implementation of its operation kernels. ParDNN partitions DNNs having billions of parameters and hundreds of thousands of operations in seconds to few minutes. Our experiments with TensorFlow on 16 GPUs demonstrate efficient training of 5 very large models while achieving superlinear scaling for both the batch size and training throughput. ParDNN either outperforms or qualitatively improves upon the related work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题