论文标题
CDKT-FL:联合学习中使用代理数据集的跨设备知识转移
CDKT-FL: Cross-Device Knowledge Transfer using Proxy Dataset in Federated Learning
论文作者
论文摘要
在实践环境中,如何在概括和个性化能力方面启用强大的联合学习(FL)系统是一个重要的研究问题。由于非i.i.d的后果,这是一个具有挑战性的问题。客户数据的属性,通常称为统计异质性,以及各种数据分布中的局部数据样本。因此,为了开发强大的广义全球和个性化模型,传统的FL方法需要从偏见的本地模型中重新设计知识的聚合,同时考虑由于客户数据偏差而考虑的大量学习参数差异。在这项工作中,我们证明了知识转移机制实现了这些目标,并开发了一种基于知识蒸馏的新型方法,以研究全球模型与本地模型之间的知识转移程度。从此以后,我们的方法考虑了在跨装置知识传输过程中使用跨越模型转移训练的模型的嵌入代表载体的适用性,并使用异质FL中的小代理数据集进行了跨设备知识转移。在这样做的过程中,我们替代地执行一般公式后的跨设备知识转移为1)全球知识转移,而2)设备知识转移。通过对三个联合数据集的模拟,我们显示了所提出的方法可实现重大的加速和当地模型的高个性化性能。此外,所提出的方法在培训过程中提供了比其他基线的算法更稳定的算法,在交换训练有素的模型的结果和表示时,通信数据负载最少。
In a practical setting, how to enable robust Federated Learning (FL) systems, both in terms of generalization and personalization abilities, is one important research question. It is a challenging issue due to the consequences of non-i.i.d. properties of client's data, often referred to as statistical heterogeneity, and small local data samples from the various data distributions. Therefore, to develop robust generalized global and personalized models, conventional FL methods need to redesign the knowledge aggregation from biased local models while considering huge divergence of learning parameters due to skewed client data. In this work, we demonstrate that the knowledge transfer mechanism achieves these objectives and develop a novel knowledge distillation-based approach to study the extent of knowledge transfer between the global model and local models. Henceforth, our method considers the suitability of transferring the outcome distribution and (or) the embedding vector of representation from trained models during cross-device knowledge transfer using a small proxy dataset in heterogeneous FL. In doing so, we alternatively perform cross-device knowledge transfer following general formulations as 1) global knowledge transfer and 2) on-device knowledge transfer. Through simulations on three federated datasets, we show the proposed method achieves significant speedups and high personalized performance of local models. Furthermore, the proposed approach offers a more stable algorithm than other baselines during the training, with minimal communication data load when exchanging the trained model's outcomes and representation.