论文标题
跨设备联合学习中的缩放语言模型大小
Scaling Language Model Size in Cross-Device Federated Learning
论文作者
论文摘要
由于服务器客户 - 客户通信和设备上的计算瓶颈,大多数研究联合学习的研究都集中在小型模型上。在这项工作中,我们利用各种技术来缓解这些瓶颈,以在联合学习中训练更大的语言模型。借助部分模型培训,量化,有效的转移学习和沟通优化器的系统应用,我们能够培训$ 21 $ M的参数变压器和20.2 $ M的参数构象体,这些构象与$ \ sim110 \ sim 10 \ sim $ \ sim $ \ sims $ simple closits $ shoults comports comporty and STERMITION和$ steriply comporty and peply contermys and pertecly and comporty and pecy peclys and comporty and compartm的参数相同或更高。
Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train larger language models in cross-device federated learning. With systematic applications of partial model training, quantization, efficient transfer learning, and communication-efficient optimizers, we are able to train a $21$M parameter Transformer and $20.2$M parameter Conformer that achieve the same or better perplexity as that of a similarly sized LSTM with $\sim10\times$ smaller client-to-server communication cost and $11\%$ lower perplexity than smaller LSTMs commonly studied in literature.