BANG：桥接自回旋和非自动回旋产生，并进行大规模预处理

论文标题

BANG：桥接自回旋和非自动回旋产生，并进行大规模预处理

BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining

论文作者

Qi, Weizhen, Gong, Yeyun, Jiao, Jian, Yan, Yu, Chen, Weizhu, Liu, Dayiheng, Tang, Kewen, Li, Houqiang, Chen, Jiusheng, Zhang, Ruofei, Zhou, Ming, Duan, Nan

论文摘要

在本文中，我们提出了BANG，这是一种新的预处理模型，以弥合自回归（AR）和非自动进取（NAR）一代之间的差距。可以将AR和NAR生成统一地考虑到以前的代币，并通过设计一种新型模型结构来进行大规模预处理，并爆炸了AR和NAR生成。预验证的BANG模型可以同时支持AR，NAR和半夜生成以满足不同的要求。关于问题产生的实验（小队1.1），摘要（XSUM）和对话生成（人为）表明，BANG可显着提高NAR和半猫的性能，并通过强大的AR预审预告添加模型获得可比性的性能。与半晚上强基线相比，BANG分别在1.1和XSUM的总分中获得了14.01和5.24的绝对改善。此外，与强的NAR基准相比，BANG在小队，XSUM和人为的总分中分别实现了10.73、6.39和5.90的绝对改善。

In this paper, we propose BANG, a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation. AR and NAR generation can be uniformly regarded as to what extent previous tokens can be attended, and BANG bridges AR and NAR generation by designing a novel model structure for large-scale pretraining. The pretrained BANG model can simultaneously support AR, NAR and semi-NAR generation to meet different requirements. Experiments on question generation (SQuAD 1.1), summarization (XSum) and dialogue generation (PersonaChat) show that BANG improves NAR and semi-NAR performance significantly as well as attaining comparable performance with strong AR pretrained models. Compared with the semi-NAR strong baselines, BANG achieves absolute improvements of 14.01 and 5.24 in the overall scores of SQuAD 1.1 and XSum, respectively. In addition, BANG achieves absolute improvements of 10.73, 6.39 and 5.90 in the overall scores of SQuAD, XSUM and PersonaChat respectively compared with the strong NAR baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题