论文标题
超过时变网络的非凸口分散学习的最佳复杂性
Optimal Complexity in Non-Convex Decentralized Learning over Time-Varying Networks
论文作者
论文摘要
随时间变化网络的分散优化是机器学习中的新兴范式。它可以在大规模深度训练中节省出色的沟通开销,并且在无线方案中更强大,尤其是当节点移动时。联合学习也可以被视为分散的优化,随着时间变化的沟通模式在全球平均和本地更新之间交替。 尽管存在许多研究来阐明其理论限制并发展有效的算法,但尚不清楚非凸出的分散性随机性优化在时间变化网络上的最佳复杂性是什么。主要困难在于如何通过随时间变化的通信在两个节点之间传输消息时如何衡量有效性,以及当固定网络大小时如何建立下限(这是随机优化的先决条件)。本文解决了这些挑战,并确定了第一个下界复杂性。我们还开发了一种新的分散算法,以几乎达到下限,显示了下限的紧密度和算法的最佳性。
Decentralized optimization with time-varying networks is an emerging paradigm in machine learning. It saves remarkable communication overhead in large-scale deep training and is more robust in wireless scenarios especially when nodes are moving. Federated learning can also be regarded as decentralized optimization with time-varying communication patterns alternating between global averaging and local updates. While numerous studies exist to clarify its theoretical limits and develop efficient algorithms, it remains unclear what the optimal complexity is for non-convex decentralized stochastic optimization over time-varying networks. The main difficulties lie in how to gauge the effectiveness when transmitting messages between two nodes via time-varying communications, and how to establish the lower bound when the network size is fixed (which is a prerequisite in stochastic optimization). This paper resolves these challenges and establish the first lower bound complexity. We also develop a new decentralized algorithm to nearly attain the lower bound, showing the tightness of the lower bound and the optimality of our algorithm.