论文标题
本地SGD:统一理论和新的有效方法
Local SGD: Unified Theory and New Efficient Methods
论文作者
论文摘要
我们提出了一个统一的框架,用于分析凸面中的本地SGD方法,并强烈凸出了监督机器学习模型的分布式/联合培训。我们恢复了几种已知方法,作为我们的一般框架的一种特殊情况,包括本地-SGD/fedAvg,脚手架,以及最初不是为联合学习设计的几种SGD的变体。我们的框架涵盖了相同的和异质的数据设置,支持本地步骤的随机数和确定性数量,并且可以与各种局部随机梯度估计器一起工作,包括移动的估计器,这些估计器能够调整本地迭代的固定点以获得更快的融合。作为我们框架的应用,我们开发了多个新型FL优化器,这些优化器优于现有方法。特别是,我们开发了第一个线性收敛的本地SGD方法,该方法不需要任何数据同质性或其他强大的假设。
We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models. We recover several known methods as a special case of our general framework, including Local-SGD/FedAvg, SCAFFOLD, and several variants of SGD not originally designed for federated learning. Our framework covers both the identical and heterogeneous data settings, supports both random and deterministic number of local steps, and can work with a wide array of local stochastic gradient estimators, including shifted estimators which are able to adjust the fixed points of local iterations for faster convergence. As an application of our framework, we develop multiple novel FL optimizers which are superior to existing methods. In particular, we develop the first linearly converging local SGD method which does not require any data homogeneity or other strong assumptions.