跨设备联合学习中的缩放语言模型大小

论文标题

跨设备联合学习中的缩放语言模型大小

Scaling Language Model Size in Cross-Device Federated Learning

论文作者

Ro, Jae Hun, Breiner, Theresa, McConnaughey, Lara, Chen, Mingqing, Suresh, Ananda Theertha, Kumar, Shankar, Mathews, Rajiv

论文摘要

由于服务器客户 - 客户通信和设备上的计算瓶颈，大多数研究联合学习的研究都集中在小型模型上。在这项工作中，我们利用各种技术来缓解这些瓶颈，以在联合学习中训练更大的语言模型。借助部分模型培训，量化，有效的转移学习和沟通优化器的系统应用，我们能够培训$ 21 $ M的参数变压器和20.2 $ M的参数构象体，这些构象与$ \ sim110 \ sim 10 \ sim $ \ sim $ \ sims $ simple closits $ shoults comports comporty and STERMITION和$ steriply comporty and peply contermys and pertecly and comporty and pecy peclys and comporty and compartm的参数相同或更高。

Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train larger language models in cross-device federated learning. With systematic applications of partial model training, quantization, efficient transfer learning, and communication-efficient optimizers, we are able to train a $21$M parameter Transformer and $20.2$M parameter Conformer that achieve the same or better perplexity as that of a similarly sized LSTM with $\sim10\times$ smaller client-to-server communication cost and $11\%$ lower perplexity than smaller LSTMs commonly studied in literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题