dibFuseQ：通过扩散模型的序列文本生成序列

论文标题

dibFuseQ：通过扩散模型的序列文本生成序列

DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models

论文作者

Gong, Shansan, Li, Mukai, Feng, Jiangtao, Wu, Zhiyong, Kong, Lingpeng

论文摘要

最近，扩散模型已成为生成模型的新范式。尽管使用诸如视觉和音频之类的连续信号在域上取得了成功，但由于文本的离散性质，尤其是对于有条件的生成，将扩散模型适应自然语言的探索却没有探索。我们通过提出扩散模型来应对这一挑战，该模型是为序列到序列（SEQ2SEQ）文本生成任务而设计的。经过广泛的SEQ2SEQ任务进行了广泛的评估后，我们发现BivFuseQ与六个已建立的基线相比具有可比性甚至更好的性能，包括基于预训练的语言模型的最新模型。除了质量外，Diffuseq的一个有趣的属性是其在一代过程中的高度多样性，这在许多SEQ2SEQ任务中都需要。我们进一步包括一个理论分析，揭示了diffuseq与自动回归/非自动回忆模型之间的联系。汇集了理论分析和经验证据，我们证明了在复杂的条件语言生成任务中扩散模型的巨大潜力。代码可在\ url {https://github.com/shark-nlp/diffuseq}中获得

Recently, diffusion models have emerged as a new paradigm for generative models. Despite the success in domains using continuous signals such as vision and audio, adapting diffusion models to natural language is under-explored due to the discrete nature of texts, especially for conditional generation. We tackle this challenge by proposing DiffuSeq: a diffusion model designed for sequence-to-sequence (Seq2Seq) text generation tasks. Upon extensive evaluation over a wide range of Seq2Seq tasks, we find DiffuSeq achieving comparable or even better performance than six established baselines, including a state-of-the-art model that is based on pre-trained language models. Apart from quality, an intriguing property of DiffuSeq is its high diversity during generation, which is desired in many Seq2Seq tasks. We further include a theoretical analysis revealing the connection between DiffuSeq and autoregressive/non-autoregressive models. Bringing together theoretical analysis and empirical evidence, we demonstrate the great potential of diffusion models in complex conditional language generation tasks. Code is available at \url{https://github.com/Shark-NLP/DiffuSeq}

下载PDF全文

下载文献需遵守相关版权规定

论文标题