论文标题
基于优化的块坐标梯度编码,用于缓解分布式学习中的部分散布器
Optimization-based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning
论文作者
论文摘要
梯度编码方案通过在与所有模型参数相对应的编码局部部分衍生词中引入相同的冗余性,从而有效地减轻了分布式学习中的全面散乱。但是,它们不再对部分散乱者有效,因为它们无法利用部分散散的计算结果。本文旨在设计一种新的梯度编码方案,以减轻分布式学习中的部分散乱者。具体而言,我们考虑了一个由一名主人和n个工人组成的分布式系统,其特征在于一般的部分Straggler模型,并着重于使用梯度编码使用L模型参数解决一般的大规模机器学习问题。首先,我们提出了一个坐标梯度编码方案,该方案具有L编码参数,代表L坐标可能不同的多样性,从而生成大多数梯度编码方案。然后,我们考虑了预期的总运行时的最小化以及相对于坐标的L编码参数的完成概率的最大化,这是挑战离散优化问题的。为了降低计算复杂性,我们首先将每个都转换为等效但简单得多的离散问题,而n \ lll变量代表L坐标的分区为n个块,每个块都具有相同的冗余。这表明具有块n编码参数的等效但更容易实现的块坐标梯度编码方案。然后,我们采用持续放松以进一步降低计算复杂性。为了最小化预期的总运行时,我们开发了计算复杂性o(n^2)的迭代算法以获得最佳解决方案,并通过计算复杂性o(n)得出两个封闭形式的近似解决方案。为了最大化完成概率,我们开发了...的迭代算法
Gradient coding schemes effectively mitigate full stragglers in distributed learning by introducing identical redundancy in coded local partial derivatives corresponding to all model parameters. However, they are no longer effective for partial stragglers as they cannot utilize incomplete computation results from partial stragglers. This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning. Specifically, we consider a distributed system consisting of one master and N workers, characterized by a general partial straggler model and focuses on solving a general large-scale machine learning problem with L model parameters using gradient coding. First, we propose a coordinate gradient coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes. Then, we consider the minimization of the expected overall runtime and the maximization of the completion probability with respect to the L coding parameters for coordinates, which are challenging discrete optimization problems. To reduce computational complexity, we first transform each to an equivalent but much simpler discrete problem with N\llL variables representing the partition of the L coordinates into N blocks, each with identical redundancy. This indicates an equivalent but more easily implemented block coordinate gradient coding scheme with N coding parameters for blocks. Then, we adopt continuous relaxation to further reduce computational complexity. For the resulting minimization of expected overall runtime, we develop an iterative algorithm of computational complexity O(N^2) to obtain an optimal solution and derive two closed-form approximate solutions both with computational complexity O(N). For the resultant maximization of the completion probability, we develop an iterative algorithm of...