论文标题

Stsyn:用耐耐耐药的同步加速当地SGD

STSyn: Speeding Up Local SGD with Straggler-Tolerant Synchronization

论文作者

Zhu, Feng, Zhang, Jingjing, Wang, Xin

论文摘要

同步的本地随机梯度下降(本地SGD)遭受了一些工人的闲置和随机延迟,这是由于缓慢而散布的工人,因为它等待工人完成相同数量的本地更新。在本文中,为了减轻散乱者并提高沟通效率,开发了一种新型的本地SGD策略,名为Stsyn。关键点是等待$ k $最快的工人,同时使所有工人在每个同步回合中都不断计算,并充分利用每个工人的任何有效的(完成的)本地更新,无论散股不断人士。对平均墙壁锁定时间,本地更新的平均数量和每轮上传工人的平均数量进行分析,以评估Stsyn的性能。即使目标函数是非convex,Stsyn的收敛也是严格的。实验结果表明,拟议的Stsyn通过利用Straggler耐受技术和每个工人的其他有效局部更新,对最先进的方案的优势进行了研究,并研究了系统参数的影响。通过等待更快的工人,并允许在工人之间进行不同数量的本地更新的异质同步,Stsyn在时间和沟通效率方面都提供了实质性的改进。

Synchronous local stochastic gradient descent (local SGD) suffers from some workers being idle and random delays due to slow and straggling workers, as it waits for the workers to complete the same amount of local updates. In this paper, to mitigate stragglers and improve communication efficiency, a novel local SGD strategy, named STSyn, is developed. The key point is to wait for the $K$ fastest workers, while keeping all the workers computing continually at each synchronization round, and making full use of any effective (completed) local update of each worker regardless of stragglers. An analysis of the average wall-clock time, average number of local updates and average number of uploading workers per round is provided to gauge the performance of STSyn. The convergence of STSyn is also rigorously established even when the objective function is nonconvex. Experimental results show the superiority of the proposed STSyn against state-of-the-art schemes through utilization of the straggler-tolerant technique and additional effective local updates at each worker, and the influence of system parameters is studied. By waiting for faster workers and allowing heterogeneous synchronization with different numbers of local updates across workers, STSyn provides substantial improvements both in time and communication efficiency.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源