自我监督的神经机器翻译中的自我引起的课程学习

论文标题

自我监督的神经机器翻译中的自我引起的课程学习

Self-Induced Curriculum Learning in Self-Supervised Neural Machine Translation

论文作者

Ruiter, Dana, van Genabith, Josef, España-Bonet, Cristina

论文摘要

自我监督的神经机器翻译（SSNMT）共同学会从可比（而不是平行）语料库中识别和选择合适的培训数据，并以两项任务在良性圈子中相互支持的方式进行翻译。在这项研究中，我们对SSNMT模型在训练过程中做出的采样选择提供了深入的分析。我们展示了如何在不被告知这样做的情况下如何自我选择的样本（i）复杂性和（ii）与（iii）执行脱氧课程的任务 - 相关性。我们观察到，两种系统内部表示类型的相互统一信号的动力学对于提取和翻译性能至关重要。我们表明，就Gunning-Fog可读性指数而言，SSNMT开始从适合高中生的Wikipedia数据中提取和学习，并迅速迈向适合第一年本科生的内容。

Self-supervised neural machine translation (SSNMT) jointly learns to identify and select suitable training data from comparable (rather than parallel) corpora and to translate, in a way that the two tasks support each other in a virtuous circle. In this study, we provide an in-depth analysis of the sampling choices the SSNMT model makes during training. We show how, without it having been told to do so, the model self-selects samples of increasing (i) complexity and (ii) task-relevance in combination with (iii) performing a denoising curriculum. We observe that the dynamics of the mutual-supervision signals of both system internal representation types are vital for the extraction and translation performance. We show that in terms of the Gunning-Fog Readability index, SSNMT starts extracting and learning from Wikipedia data suitable for high school students and quickly moves towards content suitable for first year undergraduate students.

下载PDF全文

下载文献需遵守相关版权规定

论文标题