论文标题
沟通有效的联合蒸馏
Communication-Efficient Federated Distillation
论文作者
论文摘要
沟通限制是阻止联合学习系统广泛采用的主要挑战之一。最近,出现了联合蒸馏(FD),这是一种用于联合学习的新算法范式,具有根本不同的通信特性。 FD方法利用集合蒸馏技术和交换模型输出,在中央服务器和参与客户之间的未标记的公共数据集上以软标签表示。对于常规的联合学习算法(如联邦平均(FA)),沟通尺度与经过联合训练的模型的大小,以FD通信尺度的尺度,具有蒸馏数据集的大小,从而具有优势的通信属性,尤其是在训练大型模型时。在这项工作中,我们通过分析主动蒸馏数据策展,软标签量化和三角洲编码技术的影响来研究FD。根据从该分析中获得的见解,我们提出了一种有效的联合蒸馏方法的压缩联合蒸馏(CFD)。与FD相比,与联合图像分类和语言建模问题有关的广泛实验表明,我们的方法可以将实现固定绩效目标的沟通量减少两个以上的数量级以上,而与FA相比,相比之下。
Communication constraints are one of the major challenges preventing the wide-spread adoption of Federated Learning systems. Recently, Federated Distillation (FD), a new algorithmic paradigm for Federated Learning with fundamentally different communication properties, emerged. FD methods leverage ensemble distillation techniques and exchange model outputs, presented as soft labels on an unlabeled public data set, between the central server and the participating clients. While for conventional Federated Learning algorithms, like Federated Averaging (FA), communication scales with the size of the jointly trained model, in FD communication scales with the distillation data set size, resulting in advantageous communication properties, especially when large models are trained. In this work, we investigate FD from the perspective of communication efficiency by analyzing the effects of active distillation-data curation, soft-label quantization and delta-coding techniques. Based on the insights gathered from this analysis, we present Compressed Federated Distillation (CFD), an efficient Federated Distillation method. Extensive experiments on Federated image classification and language modeling problems demonstrate that our method can reduce the amount of communication necessary to achieve fixed performance targets by more than two orders of magnitude, when compared to FD and by more than four orders of magnitude when compared with FA.