Elmer：一种非自动入学的预训练的语言模型，可高效有效的文本生成

论文标题

Elmer：一种非自动入学的预训练的语言模型，可高效有效的文本生成

ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and Effective Text Generation

论文作者

Li, Junyi, Tang, Tianyi, Zhao, Wayne Xin, Nie, Jian-Yun, Wen, Ji-Rong

论文摘要

我们在预训练的语言模型（PLM）的方法下研究文本生成任务。通常，采用自动回归（AR）方法来以逐态的方式生成文本。尽管AR产生有许多优势，但通常会遭受效率低下的推理。因此，提出了非自动进取（NAR）模型，以同时生成所有目标令牌。但是，NAR模型通常由于输出文本中缺乏令牌依赖性而产生质量较低的文本。在本文中，我们提出了Elmer：NAR文本生成的一种有效有效的PLM，以明确对NAR生成期间的令牌依赖性进行建模。通过利用早期退出技术，Elmer可以根据其预测信心使代币世代相传（更自信的令牌将在较低的层中退出）。此外，我们提出了一个新颖的预训练目标，层置换语言建模，以通过在序列中为每个令牌定位出口层来预训练Elmer。对三个文本任务进行的实验表明，Elmer明显胜过NAR模型，并进一步缩小了AR PLMS（\ EG Elmer（29.92）vs Bart（30.61）Rouge-l的性能差距，而在XSUM中则可以实现10次推理速度。

We study the text generation task under the approach of pre-trained language models (PLMs). Typically, an auto-regressive (AR) method is adopted for generating texts in a token-by-token manner. Despite many advantages of AR generation, it usually suffers from inefficient inference. Therefore, non-autoregressive (NAR) models are proposed to generate all target tokens simultaneously. However, NAR models usually generate texts of lower quality due to the absence of token dependency in the output text. In this paper, we propose ELMER: an efficient and effective PLM for NAR text generation to explicitly model the token dependency during NAR generation. By leveraging the early exit technique, ELMER enables the token generations at different layers, according to their prediction confidence (a more confident token will exit at a lower layer). Besides, we propose a novel pre-training objective, Layer Permutation Language Modeling, to pre-train ELMER by permuting the exit layer for each token in sequences. Experiments on three text generation tasks show that ELMER significantly outperforms NAR models and further narrows the performance gap with AR PLMs (\eg ELMER (29.92) vs BART (30.61) ROUGE-L in XSUM) while achieving over 10 times inference speedup.

下载PDF全文

下载文献需遵守相关版权规定

论文标题