论文标题
WMT20新闻翻译任务的SJTU NICT有监督和无监督的神经机器翻译系统
SJTU-NICT's Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task
论文作者
论文摘要
在本文中,我们介绍了SJTU-NICT联合团队参与WMT 2020 Machine Translation共享任务。在这项共同的任务中,我们参与了三个语言对的四个翻译方向:英语 - 中国,在监督机器翻译轨道上的英语 - 派,在低资源和无监督的机器翻译轨道上,德语 - upper sorbian。根据语言对的不同条件,我们尝试了多种神经机器翻译(NMT)技术:文档增强的NMT,XLM XLM预训练的语言模型增强了NMT,双向翻译,作为预培训的,预培训,参考语言基于参考的基于数据依赖性的,数据依赖数据依赖性的Gaussen Ippertive Gaussem先前的目标,BT-BT-BL-BLEU BLEU BLEU BLEUENECORATION CORTISTARITAD ORVERATIAT ORFERADINE SERVENTIFE SERVINATIAD自我筛选。我们还使用TF-IDF算法来过滤训练集,以获得更相似的域,该集合具有用于登录的测试集。在我们的意见书中,主要系统赢得了英语至中文,波兰语至英语,德语到上索尔比亚翻译方向的第一名。
In this paper, we introduced our joint team SJTU-NICT 's participation in the WMT 2020 machine translation shared task. In this shared task, we participated in four translation directions of three language pairs: English-Chinese, English-Polish on supervised machine translation track, German-Upper Sorbian on low-resource and unsupervised machine translation tracks. Based on different conditions of language pairs, we have experimented with diverse neural machine translation (NMT) techniques: document-enhanced NMT, XLM pre-trained language model enhanced NMT, bidirectional translation as a pre-training, reference language based UNMT, data-dependent gaussian prior objective, and BT-BLEU collaborative filtering self-training. We also used the TF-IDF algorithm to filter the training set to obtain a domain more similar set with the test set for finetuning. In our submissions, the primary systems won the first place on English to Chinese, Polish to English, and German to Upper Sorbian translation directions.