论文标题
审议网络以及如何培训他们
Deliberation Networks and How to Train Them
论文作者
论文摘要
审议网络是一个序列到序列模型的家族,在机器翻译和语音综合等各种任务中已经实现了最先进的性能。审议网络由多个标准序列到序列模型组成,每个模型都以先前模型的初始输入和输出为条件。在培训期间,有几个关键问题:是将蒙特卡洛近似值应用于梯度还是损失,是共同还是单独训练标准模型,是在教师强迫中运行中间模型还是免费运行模式,无论是应用特定于任务的技术。审议网络上的先前工作通常会探索针对特定任务的一个或两个培训选项。这项工作介绍了一个统一的框架,涵盖了各种培训选项,并解决了上述问题。通常,近似梯度更简单。当平行培训是必不可少的时,应采用单独的培训。无论任务如何,中间模型都应处于自由运行模式。对于输出连续的任务,可以使用有指导的注意力损失来防止降解为标准模型。
Deliberation networks are a family of sequence-to-sequence models, which have achieved state-of-the-art performance in a wide range of tasks such as machine translation and speech synthesis. A deliberation network consists of multiple standard sequence-to-sequence models, each one conditioned on the initial input and the output of the previous model. During training, there are several key questions: whether to apply Monte Carlo approximation to the gradients or the loss, whether to train the standard models jointly or separately, whether to run an intermediate model in teacher forcing or free running mode, whether to apply task-specific techniques. Previous work on deliberation networks typically explores one or two training options for a specific task. This work introduces a unifying framework, covering various training options, and addresses the above questions. In general, it is simpler to approximate the gradients. When parallel training is essential, separate training should be adopted. Regardless of the task, the intermediate model should be in free running mode. For tasks where the output is continuous, a guided attention loss can be used to prevent degradation into a standard model.