Squarm-SGD：用于分散优化的沟通高效动量SGD

论文标题

Squarm-SGD：用于分散优化的沟通高效动量SGD

SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization

论文作者

Singh, Navjot, Data, Deepesh, George, Jemin, Diggavi, Suhas

论文摘要

在本文中，我们建议和分析Squarm-SGD，这是一种通过网络通过网络分散培训大规模机器学习模型的通信效率算法。在Squarm-SGD中，每个节点都使用Nesterov的动量执行固定数量的本地SGD步骤，然后将稀疏和量化的更新发送给其邻居，该更新受当地可计算的触发标准调节。我们为一般（非凸）和凸平的算法提供了融合保证，据我们所知，这是对具有动量更新的压缩分散SGD的首次理论分析。我们表明，Squarm-SGD的收敛速率与Vanilla SGD相匹配。我们从经验上表明，与当前不考虑动量更新的最新最新技术相比，Squarm-SGD中的动量更新可以带来更好的测试性能。

In this paper, we propose and analyze SQuARM-SGD, a communication-efficient algorithm for decentralized training of large-scale machine learning models over a network. In SQuARM-SGD, each node performs a fixed number of local SGD steps using Nesterov's momentum and then sends sparsified and quantized updates to its neighbors regulated by a locally computable triggering criterion. We provide convergence guarantees of our algorithm for general (non-convex) and convex smooth objectives, which, to the best of our knowledge, is the first theoretical analysis for compressed decentralized SGD with momentum updates. We show that the convergence rate of SQuARM-SGD matches that of vanilla SGD. We empirically show that including momentum updates in SQuARM-SGD can lead to better test performance than the current state-of-the-art which does not consider momentum updates.

下载PDF全文

下载文献需遵守相关版权规定

论文标题