论文标题
通过有偏压压缩的联合非凸优化的错误反馈分析
Analysis of Error Feedback in Federated Non-Convex Optimization with Biased Compression
论文作者
论文摘要
在联合学习(例如无线网络)中,客户与中央服务器之间的通信成本通常可能是瓶颈。为了降低沟通成本,沟通压缩的范式已成为文献中的流行策略。在本文中,我们专注于非凸FL问题中有偏见的梯度压缩技术。在分布式学习的经典环境中,错误反馈方法(EF)是一种纠正偏见梯度压缩弊端的常见技术。在这项工作中,我们研究了配备有错误反馈的压缩FL方案,名为Fed-EF。我们进一步提出了两个变体:FED-EF-SGD和FED-EF-AMS,具体取决于全球模型优化器的选择。我们提供了一个通用的理论分析,该分析表明,在FL中直接应用偏置压缩会导致收敛速率的不变偏差。所提出的Fed-EF能够匹配数据异质性在数据异质性下的全精度FL对应物的收敛速率。 此外,我们在部分客户参与下对EF进行了新的分析,这是佛罗里达州的重要情况。我们证明,在部分参与下,由于所谓的``陈旧误差补偿''效应,FED-EF的收敛速率表现出额外的减速因素。进行了一项数值研究,以证明陈旧误差积累对部分参与下FED-EF的规范收敛的直观影响。最后,我们还证明将双向压缩纳入Fed-EF不会改变收敛结果。总而言之,我们的工作对联邦非凸优化的错误反馈进行了彻底的分析。我们对部分客户参与的分析还提供了有关错误反馈机制的理论限制以及改进的可能方向的见解。
In federated learning (FL) systems, e.g., wireless networks, the communication cost between the clients and the central server can often be a bottleneck. To reduce the communication cost, the paradigm of communication compression has become a popular strategy in the literature. In this paper, we focus on biased gradient compression techniques in non-convex FL problems. In the classical setting of distributed learning, the method of error feedback (EF) is a common technique to remedy the downsides of biased gradient compression. In this work, we study a compressed FL scheme equipped with error feedback, named Fed-EF. We further propose two variants: Fed-EF-SGD and Fed-EF-AMS, depending on the choice of the global model optimizer. We provide a generic theoretical analysis, which shows that directly applying biased compression in FL leads to a non-vanishing bias in the convergence rate. The proposed Fed-EF is able to match the convergence rate of the full-precision FL counterparts under data heterogeneity with a linear speedup. Moreover, we develop a new analysis of the EF under partial client participation, which is an important scenario in FL. We prove that under partial participation, the convergence rate of Fed-EF exhibits an extra slow-down factor due to a so-called ``stale error compensation'' effect. A numerical study is conducted to justify the intuitive impact of stale error accumulation on the norm convergence of Fed-EF under partial participation. Finally, we also demonstrate that incorporating the two-way compression in Fed-EF does not change the convergence results. In summary, our work conducts a thorough analysis of the error feedback in federated non-convex optimization. Our analysis with partial client participation also provides insights on a theoretical limitation of the error feedback mechanism, and possible directions for improvements.