论文标题

混合分散的优化:利用第一和零阶优化器以更快的收敛

Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence

论文作者

Talaei, Shayan, Ansaripour, Matin, Nadiradze, Giorgi, Alistarh, Dan

论文摘要

分布式优化是加速机器学习训练的标准方式,该地区的大多数研究都集中在分布式一阶,基于梯度的方法上。但是,在某些设置中,某些计算结合的节点可能无法实现一阶,基于梯度的优化,而它们仍然可以为联合优化任务做出贡献。在本文中,我们启动了混合分散化优化的研究,研究设置,其中具有零级和一阶优化功能的节点共存于分布式系统中,并试图在某些数据分布上共同解决优化任务。我们从本质上表明,在合理的参数设置下,这种系统不仅可以承受嘈杂的零订单代理,而且甚至可以从将这些代理集成到优化过程中,而不是忽略其信息中。我们方法的核心是对分布式优化的新分析,并具有嘈杂的梯度估计器,这可能具有独立的兴趣。我们的结果适用于凸面和非凸目标。标准优化任务的实验结果证实了我们的分析,表明即使训练深层神经网络,混合动力一体式优化也可能是实用的。

Distributed optimization is the standard way of speeding up machine learning training, and most of the research in the area focuses on distributed first-order, gradient-based methods. Yet, there are settings where some computationally-bounded nodes may not be able to implement first-order, gradient-based optimization, while they could still contribute to joint optimization tasks. In this paper, we initiate the study of hybrid decentralized optimization, studying settings where nodes with zeroth-order and first-order optimization capabilities co-exist in a distributed system, and attempt to jointly solve an optimization task over some data distribution. We essentially show that, under reasonable parameter settings, such a system can not only withstand noisier zeroth-order agents but can even benefit from integrating such agents into the optimization process, rather than ignoring their information. At the core of our approach is a new analysis of distributed optimization with noisy and possibly-biased gradient estimators, which may be of independent interest. Our results hold for both convex and non-convex objectives. Experimental results on standard optimization tasks confirm our analysis, showing that hybrid first-zeroth order optimization can be practical, even when training deep neural networks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源