论文标题
Torchscale:变压器大规模
TorchScale: Transformers at Scale
论文作者
论文摘要
大型变压器已经在许多任务中实现了最先进的性能。大多数开源库缩放变压器都专注于改善培训或通过更好的并行化推断。在这项工作中,我们提出了Torchscale,这是一种开源工具包,可让研究人员和开发人员有效,有效地扩展变压器。 Torchscale具有几种建模技术的实施,可以提高建模的通用性和能力,以及训练稳定性和效率。语言建模和神经机器翻译的实验结果表明,Torchscale可以成功地将变压器扩展到不同的大小而不会造成泪水。该库可在https://aka.ms/torchscale上找到。
Large Transformers have achieved state-of-the-art performance across many tasks. Most open-source libraries on scaling Transformers focus on improving training or inference with better parallelization. In this work, we present TorchScale, an open-source toolkit that allows researchers and developers to scale up Transformers efficiently and effectively. TorchScale has the implementation of several modeling techniques, which can improve modeling generality and capability, as well as training stability and efficiency. Experimental results on language modeling and neural machine translation demonstrate that TorchScale can successfully scale Transformers to different sizes without tears. The library is available at https://aka.ms/torchscale.