论文标题
理解和改善神经机器翻译的序列预测
Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation
论文作者
论文摘要
在本文中,我们为更好地理解神经机器翻译〜(NMT)的SOTA序列对序列(SEQ2SEQ)的实质性步骤。我们专注于研究联合预处理的解码器的影响,这是NMT的SEQ2SEQ预处理和先前基于编码器的预训练方法之间的主要区别。通过在三对语言对上仔细设计实验,我们发现SEQ2SEQ预处理是一把双刃剑:一方面,它有助于NMT模型产生更多样化的翻译并减少与足够相关的翻译错误。另一方面,SEQ2SEQ预处理和NMT固定之间的差异限制了翻译质量(即域差异)并引起过度估计问题(即客观差异)。基于这些观察结果,我们进一步提出了简单有效的策略,即分别指定的构想预处理和投入适应,以分别纠正域和客观差异。几种语言对的实验结果表明,我们的方法可以在SEQ2SEQ进行预处理时始终如一地改善翻译性能和模型鲁棒性。
In this paper, we present a substantial step in better understanding the SOTA sequence-to-sequence (Seq2Seq) pretraining for neural machine translation~(NMT). We focus on studying the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoder-based pretraining approaches for NMT. By carefully designing experiments on three language pairs, we find that Seq2Seq pretraining is a double-edged sword: On one hand, it helps NMT models to produce more diverse translations and reduce adequacy-related translation errors. On the other hand, the discrepancies between Seq2Seq pretraining and NMT finetuning limit the translation quality (i.e., domain discrepancy) and induce the over-estimation issue (i.e., objective discrepancy). Based on these observations, we further propose simple and effective strategies, named in-domain pretraining and input adaptation to remedy the domain and objective discrepancies, respectively. Experimental results on several language pairs show that our approach can consistently improve both translation performance and model robustness upon Seq2Seq pretraining.