分布式稀疏SGD和多数投票

论文标题

分布式稀疏SGD和多数投票

Distributed Sparse SGD with Majority Voting

论文作者

Ozfatura, Kerem, Ozfatura, Emre, Gunduz, Deniz

论文摘要

分布式学习，尤其是分布式随机梯度下降（DSGD）的变体，通过利用几位工人的计算资源来加快培训。但是，实际上，由于工人和参数服务器之间需要交换的大量信息，通信延迟成为瓶颈。减轻通信瓶颈的最有效策略之一是Top-K稀疏。但是，Top-K稀疏需要额外的通信负载来表示稀疏模式，并且工人的稀疏模式之间的不匹配阻止了对有效通信协议的开发。为了解决这些问题，我们介绍了一种基于多数投票的新型稀疏沟通策略，在这种情况下，工人首先就稀疏表示的结构达成共识。该策略可显着减少通信负载，并允许在两个通信方向上使用相同的稀疏度。通过在CIFAR-10数据集上进行大量模拟，我们表明可以达到多达X4000压缩的方法，而不会在测试准确性中损失任何损失。

Distributed learning, particularly variants of distributed stochastic gradient descent (DSGD), are widely employed to speed up training by leveraging computational resources of several workers. However, in practise, communication delay becomes a bottleneck due to the significant amount of information that needs to be exchanged between the workers and the parameter server. One of the most efficient strategies to mitigate the communication bottleneck is top-K sparsification. However, top-K sparsification requires additional communication load to represent the sparsity pattern, and the mismatch between the sparsity patterns of the workers prevents exploitation of efficient communication protocols. To address these issues, we introduce a novel majority voting based sparse communication strategy, in which the workers first seek a consensus on the structure of the sparse representation. This strategy provides a significant reduction in the communication load and allows using the same sparsity level in both communication directions. Through extensive simulations on the CIFAR-10 dataset, we show that it is possible to achieve up to x4000 compression without any loss in the test accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题