论文标题

拜兹希尔德:一种用于分布式培训的高效且健壮的系统

ByzShield: An Efficient and Robust System for Distributed Training

论文作者

Konstantinidis, Konstantinos, Ramamoorthy, Aditya

论文摘要

在分布式簇上对大规模模型的培训是机器学习管道的关键组成部分。但是,如果某些工人以对抗性(拜占庭)方式行事,可以轻松地进行此培训,以使他们将任意结果返回参数服务器(PS)。许多现有论文考虑了各种攻击模型,并提出了强大的聚合和/或计算冗余,以减轻这些攻击的影响。在这项工作中,我们考虑了一种无所不知的攻击模型,在该模型中,对手对工人的梯度计算分配有充分的了解,并且可以选择(最多)攻击K Worker节点中的任何Q,以造成最大损害。我们基于冗余的方法Byzshield利用了双分式扩展器图的属性,用于将任务分配给工人;这有助于有效减轻拜占庭行为的影响。具体而言,我们根据构造的特征值(基于相互正交的拉丁正方形和Ramanujan图形),在损坏梯度的最坏情况下展示了上限。我们的数值实验表明,与最新情况相比,损坏梯度的比例平均降低了36%。同样,我们对训练的实验,然后在CIFAR-10数据集上进行图像分类表明,在最复杂的攻击下,拜兹希尔德的准确性平均具有20%的优势。与先前的工作相比,拜茨希尔德还可以容忍比较较大的对抗性节点。

Training of large scale models on distributed clusters is a critical component of the machine learning pipeline. However, this training can easily be made to fail if some workers behave in an adversarial (Byzantine) fashion whereby they return arbitrary results to the parameter server (PS). A plethora of existing papers consider a variety of attack models and propose robust aggregation and/or computational redundancy to alleviate the effects of these attacks. In this work we consider an omniscient attack model where the adversary has full knowledge about the gradient computation assignments of the workers and can choose to attack (up to) any q out of K worker nodes to induce maximal damage. Our redundancy-based method ByzShield leverages the properties of bipartite expander graphs for the assignment of tasks to workers; this helps to effectively mitigate the effect of the Byzantine behavior. Specifically, we demonstrate an upper bound on the worst case fraction of corrupted gradients based on the eigenvalues of our constructions which are based on mutually orthogonal Latin squares and Ramanujan graphs. Our numerical experiments indicate over a 36% reduction on average in the fraction of corrupted gradients compared to the state of the art. Likewise, our experiments on training followed by image classification on the CIFAR-10 dataset show that ByzShield has on average a 20% advantage in accuracy under the most sophisticated attacks. ByzShield also tolerates a much larger fraction of adversarial nodes compared to prior work.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源