用于梯度估计器的多功能单环方法：第一阶和二阶最优性及其在联合学习中的应用

论文标题

用于梯度估计器的多功能单环方法：第一阶和二阶最优性及其在联合学习中的应用

Versatile Single-Loop Method for Gradient Estimator: First and Second Order Optimality, and its Application to Federated Learning

论文作者

Oko, Kazusato, Akiyama, Shunta, Murata, Tomoya, Suzuki, Taiji

论文摘要

虽然降低方差方法在解决大规模优化问题方面取得了巨大的成功，但其中许多人遭受累积错误，因此应定期需要进行完整的梯度计算。在本文中，我们提出了一种称为Sledge的单环算法（梯度估计器的单环方法），用于有限的和NonConvex优化，该算法不需要定期刷新梯度估计器，但可以实现几乎最佳的梯度复杂性。与现有方法不同，雪橇具有多功能性的优势。（i）二阶最优性，（ii）PL区域中的指数收敛性，以及（iii）在较小的数据异质性下的复杂性较小。我们通过利用这些有利的特性来建立有效的联合学习算法。我们展示了输出的一阶和二阶最优性，并在PL条件下提供分析。当本地预算足够大，并且客户少（Hessian-）〜异质时，该算法需要较少的沟通回合，而不是现有方法，例如FedAvg，脚手架和Mime。在数值实验中验证了我们方法的优势。

While variance reduction methods have shown great success in solving large scale optimization problems, many of them suffer from accumulated errors and, therefore, should periodically require the full gradient computation. In this paper, we present a single-loop algorithm named SLEDGE (Single-Loop mEthoD for Gradient Estimator) for finite-sum nonconvex optimization, which does not require periodic refresh of the gradient estimator but achieves nearly optimal gradient complexity. Unlike existing methods, SLEDGE has the advantage of versatility; (i) second-order optimality, (ii) exponential convergence in the PL region, and (iii) smaller complexity under less heterogeneity of data. We build an efficient federated learning algorithm by exploiting these favorable properties. We show the first and second-order optimality of the output and also provide analysis under PL conditions. When the local budget is sufficiently large and clients are less (Hessian-)~heterogeneous, the algorithm requires fewer communication rounds then existing methods such as FedAvg, SCAFFOLD, and Mime. The superiority of our method is verified in numerical experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题