论文标题
两阶段编码的分布式边缘学习:动态的部分梯度编码视角
Two-Stage Coded Distributed Edge Learning: A Dynamic Partial Gradient Coding Perspective
论文作者
论文摘要
Stragglers所构成的挑战阻碍了分布式学习的广泛采用来训练本地数据的全球模型。由于其带来的大量数据冗余,计算和通信间接费用,因此很难通过梯度编码来减轻此问题。另外,编码和解码的复杂性随当地工人的数量线性增加。在本文中,我们为计算阶段提供了一种轻巧的编码方法,并为通信阶段提供了公平的传输协议,以减轻Straggler问题。为计算阶段提出了一个两阶段的动态编码方案,其中部分阶段是由一部分工人计算的,其余部分是根据其在第一阶段的完成状态来决定的。为了确保公平的通信,设计的Lyapunov功能旨在平衡入学数据公平性并最大化吞吐量。广泛的实验结果表明,即使在实际的网络条件和基准数据下,我们提出的解决方案在分布式学习系统中的准确性和资源利用方面具有优势。
The widespread adoption of distributed learning to train a global model from local data has been hindered by the challenge posed by stragglers. Recent attempts to mitigate this issue through gradient coding have proved difficult due to the large amounts of data redundancy, computational and communicational overhead it brings. Additionally, the complexity of encoding and decoding increases linearly with the number of local workers. In this paper, we present a lightweight coding method for the computing phase and a fair transmission protocol for the communication phase, to mitigate the straggler problem. A two-stage dynamic coding scheme is proposed for the computing phase, where partial gradients are computed by a portion of workers in the first stage and the remainder are decided based on their completion status in the first stage. To ensure fair communication, a perturbed Lyapunov function is designed to balance admission data fairness and maximize throughput. Extensive experimental results demonstrate the superiority of our proposed solution in terms of accuracy and resource utilization in the distributed learning system, even under practical network conditions and benchmark data.