论文标题
将hanja历史文件翻译成当代韩语和英语
Translating Hanja Historical Documents to Contemporary Korean and English
论文作者
论文摘要
约瑟王朝(AJD)的年鉴包含了朝鲜现代国家之前500年的王国约瑟(Joseon)的日常记录。年鉴最初是用古老的韩国写作系统“ hanja”撰写的,并于1968年至1993年被翻译成韩语。然而,由此产生的翻译太字面了,包含许多古老的韩语单词。因此,一项新的专家翻译工作始于2012年。从那时起,只有一位国王的记录在十年内完成了。同时,专家翻译人员正在研究英语翻译,同时也以缓慢的速度制作了一位国王的唱片。因此,我们提出了一种神经机器翻译模型H2KE,该模型将Hanja中的历史文档转换为更容易理解的韩语和英语。 H2KE建立在多语言神经机器翻译的顶部,学会了翻译用Hanja编写的历史文档,这是从过时的韩语翻译数据集以及最近翻译为当代韩语和英语的小型数据集。我们将我们的方法与两个基线的方法进行了比较:一种同时学习恢复和翻译Hanja历史文档的模型和仅在新翻译的Corpora培训的基于变压器的模型。实验表明,我们的方法在当代韩语和英语翻译的BLEU分数方面大大优于基准。我们进一步进行了广泛的人类评估,这表明我们的翻译比专家和非专家韩国演讲者的原始专家翻译更喜欢。
The Annals of Joseon Dynasty (AJD) contain the daily records of the Kings of Joseon, the 500-year kingdom preceding the modern nation of Korea. The Annals were originally written in an archaic Korean writing system, `Hanja', and were translated into Korean from 1968 to 1993. The resulting translation was however too literal and contained many archaic Korean words; thus, a new expert translation effort began in 2012. Since then, the records of only one king have been completed in a decade. In parallel, expert translators are working on English translation, also at a slow pace and produced only one king's records in English so far. Thus, we propose H2KE, a neural machine translation model, that translates historical documents in Hanja to more easily understandable Korean and to English. Built on top of multilingual neural machine translation, H2KE learns to translate a historical document written in Hanja, from both a full dataset of outdated Korean translation and a small dataset of more recently translated contemporary Korean and English. We compare our method against two baselines: a recent model that simultaneously learns to restore and translate Hanja historical document and a Transformer based model trained only on newly translated corpora. The experiments reveal that our method significantly outperforms the baselines in terms of BLEU scores for both contemporary Korean and English translations. We further conduct extensive human evaluation which shows that our translation is preferred over the original expert translations by both experts and non-expert Korean speakers.