论文标题
焦糖:通过计算计划加速分散分布的深度学习
Caramel: Accelerating Decentralized Distributed Deep Learning with Computation Scheduling
论文作者
论文摘要
深度神经网络(DNN)培训中参数聚合的选择方法是一项网络密集型任务,是从参数服务器模型转移到受理论保证更好地表现的启发的分散的聚合方案(AllReduce)。但是,当前的Allreduce实现忽略了通信和计算的相互依赖性,从而导致了巨大的绩效降级。在本文中,我们开发了焦糖,该系统通过模型感知的计算调度计划和Alleduce的通信优化加速分散分布的深度学习。焦糖通过(a)计算DAG调度来实现此目标,该计划扩展了每个参数(转移边界)的可行传输窗口,以及(b)网络优化,用于平稳负载,包括自适应批处理和参数转移的管道填充。焦糖保持数据流模型的正确性,是无关的,并且不需要任何用户级别或框架级别的更改。我们通过张力流实施焦糖,并表明在云环境中,DNN训练的迭代时间最高可提高3.62倍。
The method of choice for parameter aggregation in Deep Neural Network (DNN) training, a network-intensive task, is shifting from the Parameter Server model to decentralized aggregation schemes (AllReduce) inspired by theoretical guarantees of better performance. However, current implementations of AllReduce overlook the interdependence of communication and computation, resulting in significant performance degradation. In this paper, we develop Caramel, a system that accelerates decentralized distributed deep learning through model-aware computation scheduling and communication optimizations for AllReduce. Caramel achieves this goal through (a) computation DAG scheduling that expands the feasible window of transfer for each parameter (transfer boundaries), and (b) network optimizations for smoothening of the load including adaptive batching and pipelining of parameter transfers. Caramel maintains the correctness of the dataflow model, is hardware-independent, and does not require any user-level or framework-level changes. We implement Caramel over TensorFlow and show that the iteration time of DNN training can be improved by up to 3.62x in a cloud environment.