论文标题
分布式学习与稀疏梯度差异
Distributed Learning With Sparsified Gradient Differences
论文作者
论文摘要
通常需要大量的通信来解决分布式学习任务,这严重限制了无线通信应用程序中的可扩展性和收敛速度。在本文中,我们设计了一种具有稀疏和误差校正(GD-SEC)的梯度下降方法,以提高一般工人服务器体系结构的通信效率。 GD-SEC受到各种无线通信学习方案的动机,减少了从工人到服务器的每个通信的次数,而不会在收敛速度的顺序下退化。这使大规模的模型学习无需牺牲融合或准确性。在GD-SEC的每次迭代中,每个工人都不直接传输整个梯度向量,而是计算其当前梯度与以前传输梯度的线性组合之间的差异,然后将稀疏梯度差异传输到服务器。 GD-SEC的一个关键特征是,如果梯度差矢量的任何给定分量如果其幅度不够大,则不会传输。在每个工人中使用误差校正技术来补偿因稀疏而导致的错误。我们证明,GD-SEC可以保证以与GD相同的收敛速率顺序汇总,凸和非凸优化问题。此外,如果目标函数强烈凸,则GD-SEC具有快速的线性收敛速率。数值结果不仅验证了GD-SEC的收敛速率,而且还探讨了其提供的通信位节省。给定目标准确性,与最佳现有算法相比,GD-SEC可以显着减少通信负载,而不会减慢优化过程。
A very large number of communications are typically required to solve distributed learning tasks, and this critically limits scalability and convergence speed in wireless communications applications. In this paper, we devise a Gradient Descent method with Sparsification and Error Correction (GD-SEC) to improve the communications efficiency in a general worker-server architecture. Motivated by a variety of wireless communications learning scenarios, GD-SEC reduces the number of bits per communication from worker to server with no degradation in the order of the convergence rate. This enables larger-scale model learning without sacrificing convergence or accuracy. At each iteration of GD-SEC, instead of directly transmitting the entire gradient vector, each worker computes the difference between its current gradient and a linear combination of its previously transmitted gradients, and then transmits the sparsified gradient difference to the server. A key feature of GD-SEC is that any given component of the gradient difference vector will not be transmitted if its magnitude is not sufficiently large. An error correction technique is used at each worker to compensate for the error resulting from sparsification. We prove that GD-SEC is guaranteed to converge for strongly convex, convex, and nonconvex optimization problems with the same order of convergence rate as GD. Furthermore, if the objective function is strongly convex, GD-SEC has a fast linear convergence rate. Numerical results not only validate the convergence rate of GD-SEC but also explore the communication bit savings it provides. Given a target accuracy, GD-SEC can significantly reduce the communications load compared to the best existing algorithms without slowing down the optimization process.