混合分散的优化：利用第一和零阶优化器以更快的收敛

论文标题

混合分散的优化：利用第一和零阶优化器以更快的收敛

Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence

论文作者

Talaei, Shayan, Ansaripour, Matin, Nadiradze, Giorgi, Alistarh, Dan

论文摘要

分布式优化是加速机器学习训练的标准方式，该地区的大多数研究都集中在分布式一阶，基于梯度的方法上。但是，在某些设置中，某些计算结合的节点可能无法实现一阶，基于梯度的优化，而它们仍然可以为联合优化任务做出贡献。在本文中，我们启动了混合分散化优化的研究，研究设置，其中具有零级和一阶优化功能的节点共存于分布式系统中，并试图在某些数据分布上共同解决优化任务。我们从本质上表明，在合理的参数设置下，这种系统不仅可以承受嘈杂的零订单代理，而且甚至可以从将这些代理集成到优化过程中，而不是忽略其信息中。我们方法的核心是对分布式优化的新分析，并具有嘈杂的梯度估计器，这可能具有独立的兴趣。我们的结果适用于凸面和非凸目标。标准优化任务的实验结果证实了我们的分析，表明即使训练深层神经网络，混合动力一体式优化也可能是实用的。

Distributed optimization is the standard way of speeding up machine learning training, and most of the research in the area focuses on distributed first-order, gradient-based methods. Yet, there are settings where some computationally-bounded nodes may not be able to implement first-order, gradient-based optimization, while they could still contribute to joint optimization tasks. In this paper, we initiate the study of hybrid decentralized optimization, studying settings where nodes with zeroth-order and first-order optimization capabilities co-exist in a distributed system, and attempt to jointly solve an optimization task over some data distribution. We essentially show that, under reasonable parameter settings, such a system can not only withstand noisier zeroth-order agents but can even benefit from integrating such agents into the optimization process, rather than ignoring their information. At the core of our approach is a new analysis of distributed optimization with noisy and possibly-biased gradient estimators, which may be of independent interest. Our results hold for both convex and non-convex objectives. Experimental results on standard optimization tasks confirm our analysis, showing that hybrid first-zeroth order optimization can be practical, even when training deep neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题