论文标题

使用JAX PJIT和TPUV对语言模型的可扩展培训

Scalable Training of Language Models using JAX pjit and TPUv4

论文作者

Yoo, Joanna, Perlin, Kuba, Kamalakara, Siddhartha Rao, Araújo, João G. M.

论文摘要

现代大型语言模型由于大小而需要分布式培训策略。在软件和硬件边界方面,有效,强大的培训的挑战都受到了快速的发展。在这份技术报告中,我们探讨了与开发可扩展培训框架相关的挑战和设计决策,并对采用新软件和硬件解决方案所带来的效率提高进行了定量分析。

Modern large language models require distributed training strategies due to their size. The challenges of efficiently and robustly training them are met with rapid developments on both software and hardware frontiers. In this technical report, we explore challenges and design decisions associated with developing a scalable training framework, and present a quantitative analysis of efficiency improvements coming from adopting new software and hardware solutions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源