论文标题
所有道路都通向罗马吗?了解初始化在迭代反向翻译中的作用
Do all Roads Lead to Rome? Understanding the Role of Initialization in Iterative Back-Translation
论文作者
论文摘要
反向翻译提供了一种简单而有效的方法,可以利用神经机器翻译(NMT)中的单语库。它的迭代变体,其中两个相反的NMT模型通过使用反向模型产生的合成平行语料库进行了交替训练,在无监督的机器翻译中起着核心作用。为了开始产生声音翻译并提供有意义的训练信号,现有的方法依赖于单独的机器翻译系统来热身迭代过程或某种形式的预训练以初始化模型的权重。在本文中,我们分析了这种初始化在迭代反向翻译中的作用。最终系统的行为是否在很大程度上取决于它?还是迭代反向翻译会收敛到任何合理初始化的类似解决方案?通过一系列对各种热身系统的经验实验,我们表明,尽管初始系统的质量确实会影响最终性能,但其效果相对较小,因为迭代反向翻译具有强烈的趋势,即将其收敛到类似解决方案。因此,初始化方法剩下的改进范围是狭窄的,这表明未来的研究应该更多地集中于改善迭代机制本身。
Back-translation provides a simple yet effective approach to exploit monolingual corpora in Neural Machine Translation (NMT). Its iterative variant, where two opposite NMT models are jointly trained by alternately using a synthetic parallel corpus generated by the reverse model, plays a central role in unsupervised machine translation. In order to start producing sound translations and provide a meaningful training signal to each other, existing approaches rely on either a separate machine translation system to warm up the iterative procedure, or some form of pre-training to initialize the weights of the model. In this paper, we analyze the role that such initialization plays in iterative back-translation. Is the behavior of the final system heavily dependent on it? Or does iterative back-translation converge to a similar solution given any reasonable initialization? Through a series of empirical experiments over a diverse set of warmup systems, we show that, although the quality of the initial system does affect final performance, its effect is relatively small, as iterative back-translation has a strong tendency to convergence to a similar solution. As such, the margin of improvement left for the initialization method is narrow, suggesting that future research should focus more on improving the iterative mechanism itself.