论文标题
分布式机器学习中的拜占庭式容错:一项调查
Byzantine Fault Tolerance in Distributed Machine Learning : a Survey
论文作者
论文摘要
拜占庭式容错(BFT)是分布式机器学习(DML)中最具挑战性的问题之一,该问题定义为在存在恶意组件的情况下,耐断层系统的弹性。由于其不受限制的性质,拜占庭的失败仍然很难解决,这导致可能产生任意数据。在DML中实施BFT,正在不断做出重大的研究工作。最近的一些研究考虑了DML中的各种BFT方法。但是,某些方面是有限的,例如所分析的几种方法,并且在研究方法中使用的技术没有分类。在本文中,我们对DML中BFT的最新工作进行了调查,主要采用一阶优化方法,尤其是随机梯度下降(SGD)。我们强调关键技术以及基本方法。我们提供了DML中BFT中使用的技术的说明性描述,并在其基本技术的背景下对BFT方法进行了拟议的分类。此分类是根据特定标准建立的,例如沟通过程,优化方法和拓扑设置,这些设置表征未来的工作方法,以应对开放挑战
Byzantine Fault Tolerance (BFT) is one of the most challenging problems in Distributed Machine Learning (DML), defined as the resilience of a fault-tolerant system in the presence of malicious components. Byzantine failures are still difficult to deal with due to their unrestricted nature, which results in the possibility of generating arbitrary data. Significant research efforts are constantly being made to implement BFT in DML. Some recent studies have considered various BFT approaches in DML. However, some aspects are limited, such as the few approaches analyzed, and there is no classification of the techniques used in the studied approaches. In this paper, we present a survey of recent work surrounding BFT in DML, mainly in first-order optimization methods, especially Stochastic Gradient Descent(SGD). We highlight key techniques as well as fundamental approaches. We provide an illustrative description of the techniques used in BFT in DML, with a proposed classification of BFT approaches in the context of their fundamental techniques. This classification is established on specific criteria such as communication process, optimization method, and topology setting, which characterize future work methods addressing open challenge