通过主要子模型训练，在边缘的大型模型的联合学习

论文标题

通过主要子模型训练，在边缘的大型模型的联合学习

Federated Learning of Large Models at the Edge via Principal Sub-Model Training

论文作者

Niu, Yue, Prakash, Saurav, Kundu, Souvik, Lee, Sunwoo, Avestimehr, Salman

论文摘要

联合学习（FL）正在成为一种流行的，有希望的分散学习框架，可以在客户之间进行协作培训，而无需在他们之间或集中式服务器之间共享私人数据。但是，考虑到许多Edge客户端没有足够的计算，内存或通信功能，大型模型的联合学习仍然面临着重要的瓶颈。为了将如此薄弱但至关重要的客户保持在循环中，先前的工作要么考虑一个异质客户的环境，客户介绍了不同尺寸的模型；或向服务器卸载培训。但是，异质 - 客户设置需要一些客户培训完整模型，而该模型与资源约束设置不符。而后者的隐私权在与服务器共享中间表示或标签时会在FL中损坏隐私。为了克服这些限制，在这项工作中，我们制定了一个现实但较少的跨设备FL设置，在该设置中，没有客户可以培训完整的大型模型，也不愿意与远程服务器共享任何中间信息。在这样的公式下，我们开发了一种主要子模型（PRISM）培训方法，以协作训练一个完整的大型模型，同时为每个客户分配一个小型子模型，这是一个概率的低级别近似值，以向完整的服务器模型。当创建子模型时，Prism首先在正交内核空间中执行主内核分析以获得每个内核的重要性。然后，棱镜采用了一种新颖的意识性抽样过程来选择一个子集（即，具有较高的采样概率分配了具有很高重要性的内核）。该采样过程可确保每个子模型仍然是与完整模型的低级别近似值，而所有子模型共同在主内核上几乎完全覆盖了。

Federated Learning (FL) is emerging as a popular, promising decentralized learning framework that enables collaborative training among clients, with no need to share private data between them or to a centralized server. However, considering many edge clients do not have sufficient computing, memory, or communication capabilities, federated learning of large models still faces significant bottlenecks. To keep such weak but crucial clients in the loop, prior works either consider a heterogeneous-client setting where clients train models with different sizes; or offload training to the server. However, the heterogeneous-client setting requires some clients to train full model, which is not aligned with the resource-constrained setting; while the latter ones break privacy promises in FL when sharing intermediate representations or labels with the server. To overcome these limitations, in this work, we formulate a realistic, but much less explored, cross-device FL setting in which no client can train a full large model nor is willing to share any intermediate information with the remote server. Under such a formulation, we develop a principal sub-model (PriSM) training methodology to collaboratively train a full large model, while assigning each client a small sub-model that is a probabilistic low-rank approximation to the full server model. When creating sub-models, PriSM first performs a principal kernel analysis in the orthogonal kernel space to obtain importance of each kernel. Then, PriSM adopts a novel importance-aware sampling process to select a subset of kernels (i.e., a kernel with high importance is assigned with a higher sampling probability). This sampling process ensures each sub-model is still a low-rank approximation to the full model, while all sub-models together achieve nearly full coverage on the principal kernels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题