论文标题
通过降级培训改善神经机器翻译
Improving Neural Machine Translation by Denoising Training
论文作者
论文摘要
我们提出了一种简单有效的预处理策略{d} en {o} ising {t}用于神经机器翻译的雨点。具体而言,我们在早期使用源和目标端denoising任务更新模型参数,然后正常调整模型。值得注意的是,我们的方法不会增加任何参数或训练步骤,仅需要并行数据。实验表明,DOT始终提高12个双语和16个多语言方向的神经机器翻译性能(数据大小范围从80k到20m不等)。此外,我们表明DOT可以补充现有的数据操纵策略,即课程学习,知识蒸馏,数据多样化,双向培训和反向翻译。令人鼓舞的是,我们发现在高资源环境中,DOT的表现优于经过验证的Mabart概述的模型。分析表明,DOT是一种新型的内域跨语性预读策略,可以通过与任务相关的自我介绍提供进一步的改进。
We present a simple and effective pretraining strategy {D}en{o}ising {T}raining DoT for neural machine translation. Specifically, we update the model parameters with source- and target-side denoising tasks at the early stage and then tune the model normally. Notably, our approach does not increase any parameters or training steps, requiring the parallel data merely. Experiments show that DoT consistently improves the neural machine translation performance across 12 bilingual and 16 multilingual directions (data size ranges from 80K to 20M). In addition, we show that DoT can complement existing data manipulation strategies, i.e. curriculum learning, knowledge distillation, data diversification, bidirectional training, and back-translation. Encouragingly, we found that DoT outperforms costly pretrained model mBART in high-resource settings. Analyses show DoT is a novel in-domain cross-lingual pretraining strategy and could offer further improvements with task-relevant self-supervisions.