对序列生成的非自动入学模型的研究

论文标题

对序列生成的非自动入学模型的研究

A Study of Non-autoregressive Model for Sequence Generation

论文作者

Ren, Yi, Liu, Jinglin, Tan, Xu, Zhao, Zhou, Zhao, Sheng, Liu, Tie-Yan

论文摘要

非释放性（NAR）模型并行生成序列的所有令牌，与自回归（AR）同行相比，生成速度更快，但其精度的成本较低。已经提出了不同的技术，包括知识蒸馏和源目标对齐，以弥合各种任务中AR和NAR模型之间的差距，例如神经机器翻译（NMT），自动语音识别（ASR）以及文本到语音（TTS）。在这些技术的帮助下，NAR模型可以在某些任务中赶上AR模型的准确性，而在其他任务中则不能。在这项工作中，我们进行了一项研究，以了解NAR序列生成的难度，并尝试回答：（1）为什么NAR模型可以在某些任务中赶上AR模型，而不是全部？（2）为什么知识蒸馏和源目标对齐等技术可以帮助NAR模型。由于AR和NAR模型之间的主要区别在于，NAR模型在AR模型时不使用目标令牌之间的依赖性，因此直观地，NAR序列产生的难度在很大程度上取决于目标令牌之间的依赖性强度。为了量化这种依赖性，我们提出了一个称为逗号的分析模型，以表征不同NAR序列生成任务的难度。我们有几个有趣的发现：1）在NMT，ASR和TTS任务中，ASR具有最大的目标依赖性，而TTS最少。 2）知识蒸馏可降低目标序列中的目标依赖性，从而提高NAR模型的准确性。 3）源目标对准约束促进目标令牌对源代币的依赖性，从而简化NAR模型的训练。

Non-autoregressive (NAR) models generate all the tokens of a sequence in parallel, resulting in faster generation speed compared to their autoregressive (AR) counterparts but at the cost of lower accuracy. Different techniques including knowledge distillation and source-target alignment have been proposed to bridge the gap between AR and NAR models in various tasks such as neural machine translation (NMT), automatic speech recognition (ASR), and text to speech (TTS). With the help of those techniques, NAR models can catch up with the accuracy of AR models in some tasks but not in some others. In this work, we conduct a study to understand the difficulty of NAR sequence generation and try to answer: (1) Why NAR models can catch up with AR models in some tasks but not all? (2) Why techniques like knowledge distillation and source-target alignment can help NAR models. Since the main difference between AR and NAR models is that NAR models do not use dependency among target tokens while AR models do, intuitively the difficulty of NAR sequence generation heavily depends on the strongness of dependency among target tokens. To quantify such dependency, we propose an analysis model called CoMMA to characterize the difficulty of different NAR sequence generation tasks. We have several interesting findings: 1) Among the NMT, ASR and TTS tasks, ASR has the most target-token dependency while TTS has the least. 2) Knowledge distillation reduces the target-token dependency in target sequence and thus improves the accuracy of NAR models. 3) Source-target alignment constraint encourages dependency of a target token on source tokens and thus eases the training of NAR models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题