异步学习的延迟自适应步骤尺寸

论文标题

异步学习的延迟自适应步骤尺寸

Delay-adaptive step-sizes for asynchronous learning

论文作者

Wu, Xuyang, Magnusson, Sindri, Feyzmahdavian, Hamid Reza, Johansson, Mikael

论文摘要

在可扩展的机器学习系统中，模型训练通常在不紧密同步的多个节点上并行。相关异步算法的大多数分析结果都使用系统延迟的上限来确定学习率。这种范围不仅很难提前获得，而且还会导致不必要的趋势收敛。在本文中，我们表明可以使用取决于系统中实际变化延迟的学习率。我们为延迟自适应异步迭代而开发一般的收敛结果，并将其专门为近端的增量梯度下降和块坐标坐标下降算法。对于每种方法，我们都证明了如何在线测量延迟，当前的延迟自适应阶梯尺寸策略，并说明其在最新面前的理论和实际优势。

In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this paper, we show that it is possible to use learning rates that depend on the actual time-varying delays in the system. We develop general convergence results for delay-adaptive asynchronous iterations and specialize these to proximal incremental gradient descent and block-coordinate descent algorithms. For each of these methods, we demonstrate how delays can be measured on-line, present delay-adaptive step-size policies, and illustrate their theoretical and practical advantages over the state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题