DEPA：使用依赖性 - 意识解码器改善非自动回旋的机器翻译

论文标题

DEPA：使用依赖性 - 意识解码器改善非自动回旋的机器翻译

DePA: Improving Non-autoregressive Machine Translation with Dependency-Aware Decoder

论文作者

Zhan, Jiaao, Chen, Qian, Chen, Boxing, Wang, Wen, Bai, Yu, Gao, Yang

论文摘要

非自动回调机器翻译（NAT）模型的翻译质量低于自回旋翻译（AT）模型，因为NAT解码器不取决于解码器输入中的先前目标令牌。我们提出了一种新颖的依赖性解码器（DEPA），以从两个角度从完全NAT模型的解码器中增强目标依赖性建模：解码器自我注意和解码器输入。首先，我们在NAT培训之前提出了自回归前向后训练阶段，这使NAT解码器能够逐步学习最终NAT训练的双向目标依赖性。其次，我们通过新颖的细心转换过程将解码器输入从源语言表示空间转换为目标语言表示空间，从而使解码器能够更好地捕获目标依赖性。 DEPA可以应用于任何完全NAT模型。广泛的实验表明，DEPA始终提高了广泛使用的WMT和IWSLT基准上的高度竞争性和最先进的完全NAT模型，同时保持了可与其他完全NAT模型相当的推理潜伏期。

Non-autoregressive machine translation (NAT) models have lower translation quality than autoregressive translation (AT) models because NAT decoders do not depend on previous target tokens in the decoder input. We propose a novel and general Dependency-Aware Decoder (DePA) to enhance target dependency modeling in the decoder of fully NAT models from two perspectives: decoder self-attention and decoder input. First, we propose an autoregressive forward-backward pre-training phase before NAT training, which enables the NAT decoder to gradually learn bidirectional target dependencies for the final NAT training. Second, we transform the decoder input from the source language representation space to the target language representation space through a novel attentive transformation process, which enables the decoder to better capture target dependencies. DePA can be applied to any fully NAT models. Extensive experiments show that DePA consistently improves highly competitive and state-of-the-art fully NAT models on widely used WMT and IWSLT benchmarks by up to 1.88 BLEU gain, while maintaining the inference latency comparable to other fully NAT models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题