分布式深度学习中的沟通优化的定量调查

论文标题

分布式深度学习中的沟通优化的定量调查

A Quantitative Survey of Communication Optimizations in Distributed Deep Learning

论文作者

Shi, Shaohuai, Tang, Zhenheng, Chu, Xiaowen, Liu, Chengjian, Wang, Wei, Li, Bo

论文摘要

如今，大型而复杂的深度学习（DL）模型越来越多地以分布式的方式培训了多个工具机器，其中工人之间的广泛沟通构成了严重的扩展问题。在本文中，我们介绍了数据并行分布式DL的通信优化技术的定量调查。我们首先确定了主要的沟通挑战，并将现有解决方案分为三个级别，即学习算法，系统体系结构和网络基础架构。我们介绍了最新的通信优化技术，并对具有100GBPS Infiniband（IB）的32-GPU群集上的七种常见无损分布式DL方法进行了比较研究。我们表明，（1）具有低模型强度（例如Bert和Bert-large）的DL模型即使使用超过100Gbps IB的最佳可用无损算法也很难缩放；（2）系统体系结构和调度算法对缩放属性有关键的影响。我们将文章结束，讨论有关开放问题的讨论，以进行进一步调查。

Nowadays, large and complex deep learning (DL) models are increasingly trained in a distributed manner across multiple worker machines, in which extensive communications between workers pose serious scaling problems. In this article, we present a quantitative survey of communication optimization techniques for data parallel distributed DL. We first identify the major communication challenges and classify the existing solutions into three levels, namely the learning algorithm, the system architecture, and the network infrastructure. We present the state-of-the-art communication optimization techniques and conduct a comparative study of seven common lossless distributed DL methods on a 32-GPU cluster with 100Gbps InfiniBand (IB). We show that (1) the DL models with low model intensity (such as BERT and BERT-Large) are difficult to scale out even with the best available lossless algorithm over 100Gbps IB; (2) the system architecture and scheduling algorithms have a critical impact on the scaling property. We conclude the article with discussions on the open issues for further investigations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题