Holdout SGD：拜占庭宽容的联邦学习

论文标题

Holdout SGD：拜占庭宽容的联邦学习

Holdout SGD: Byzantine Tolerant Federated Learning

论文作者

Azulay, Shahar, Raz, Lior, Globerson, Amir, Koren, Tomer, Afek, Yehuda

论文摘要

这项工作提出了一种新的分布的拜占庭式联合学习算法，保留SGD，用于随机梯度下降（SGD）优化。 Holdout SGD以分布式的方式使用众所周知的保留估计的机器学习技术，以选择可能导致损失值低的模型的参数更新。这使其在丢弃拜占庭工人的输入方面比在学习模型的参数空间中更有效。 Holdout SGD首先随机选择一组使用其私人数据以提出梯度更新的工人。接下来，随机选择了一个工人投票委员会，每个选民将其私人数据用作保留数据，以通过投票计划选择最佳建议。我们提出了两种可能的机制，用于在Holdout SGD的分布式计算中协调工人。第一个使用真实的中央服务器，对应于当前联邦学习的典型设置。第二个是完全分布的，不需要中央服务器，为完全分散的联合学习铺平了道路。完全分布式版本通过区块链领域，特别是Algorand委员会的选择和共识流程来实现Holdout SGD。我们就其与最佳模型的融合及其对拜占庭工人的比例的韧性水平为Holdout SGD流程提供了正式的保证。经验评估表明，只要参与工人的总数较大并且拜占庭工人的比例小于一半（对于完全分布的变体而言，拜占庭工人的总数较大，拜占庭人的总数很大，拜占庭式SGD是拜占庭的弹性并有效地收敛到有效的模型。

This work presents a new distributed Byzantine tolerant federated learning algorithm, HoldOut SGD, for Stochastic Gradient Descent (SGD) optimization. HoldOut SGD uses the well known machine learning technique of holdout estimation, in a distributed fashion, in order to select parameter updates that are likely to lead to models with low loss values. This makes it more effective at discarding Byzantine workers inputs than existing methods that eliminate outliers in the parameter-space of the learned model. HoldOut SGD first randomly selects a set of workers that use their private data in order to propose gradient updates. Next, a voting committee of workers is randomly selected, and each voter uses its private data as holdout data, in order to select the best proposals via a voting scheme. We propose two possible mechanisms for the coordination of workers in the distributed computation of HoldOut SGD. The first uses a truthful central server and corresponds to the typical setting of current federated learning. The second is fully distributed and requires no central server, paving the way to fully decentralized federated learning. The fully distributed version implements HoldOut SGD via ideas from the blockchain domain, and specifically the Algorand committee selection and consensus processes. We provide formal guarantees for the HoldOut SGD process in terms of its convergence to the optimal model, and its level of resilience to the fraction of Byzantine workers. Empirical evaluation shows that HoldOut SGD is Byzantine-resilient and efficiently converges to an effectual model for deep-learning tasks, as long as the total number of participating workers is large and the fraction of Byzantine workers is less than half (<1/3 for the fully distributed variant).

下载PDF全文

下载文献需遵守相关版权规定

论文标题