Metro：有效地对大规模自动编码语言模型进行预训练，并具有模型生成的信号

论文标题

Metro：有效地对大规模自动编码语言模型进行预训练，并具有模型生成的信号

METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals

论文作者

Bajaj, Payal, Xiong, Chenyan, Ke, Guolin, Liu, Xiaodong, He, Di, Tiwary, Saurabh, Liu, Tie-Yan, Bennett, Paul, Song, Xia, Gao, Jianfeng

论文摘要

我们提出了一种有效的方法，可以使用辅助模型生成的训练信号预处理大型自动编码语言模型。该训练策略起源于伊莱克特拉（Electra），已经证明了以数亿个参数的规模预处理模型的样本效率。在这项工作中，我们进行了一项全面的实证研究，并提出了一种食谱，即“模型产生了剥夺培训目标”（Metro），该食谱融合了最近开发的一些最佳建模技术，以加快，稳定和增强预验证的语言模型而无需损害模型效果。所得模型，由高达54亿个参数组成的Metro-LM，在胶水，超粘合和小队基准上实现了新的最先进。更重要的是，Metro-LM有效，因为它们通常比以前的大型型号的模型尺寸明显较小，并且成本较低。

We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model. Originated in ELECTRA, this training strategy has demonstrated sample-efficiency to pretrain models at the scale of hundreds of millions of parameters. In this work, we conduct a comprehensive empirical study, and propose a recipe, namely "Model generated dEnoising TRaining Objective" (METRO), which incorporates some of the best modeling techniques developed recently to speed up, stabilize, and enhance pretrained language models without compromising model effectiveness. The resultant models, METRO-LM, consisting of up to 5.4 billion parameters, achieve new state-of-the-art on the GLUE, SuperGLUE, and SQuAD benchmarks. More importantly, METRO-LM are efficient in that they often outperform previous large models with significantly smaller model sizes and lower pretraining cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题