论文标题

非洲语言极低的神经机器翻译:关于班巴拉的案例研究

Neural Machine Translation for Extremely Low-Resource African Languages: A Case Study on Bambara

论文作者

Tapo, Allahsera Auguste, Coulibaly, Bakary, Diarra, Sébastien, Homan, Christopher, Kreutzer, Julia, Luger, Sarah, Nagashima, Arthur, Zampieri, Marcos, Leventhal, Michael

论文摘要

低资源语言给(神经)机器翻译带来了独特的挑战。我们讨论了班巴拉(Bambara)的案例,班巴拉(Bambara)是一种缺乏培训数据的曼德语言,需要大量的预处理。班巴拉(Bambara)演讲者生活的社会文化背景不仅仅是班巴拉本身的语言状况,对这种语言的自动处理构成了挑战。在本文中,我们介绍了Bambara向英语和法语中的机器翻译的第一个并行数据集,以及与Bambara的机器翻译上的第一个基准结果。我们讨论了使用低资源语言的挑战,并提出了应对低资源机器翻译(MT)中数据稀缺的策略。

Low-resource languages present unique challenges to (neural) machine translation. We discuss the case of Bambara, a Mande language for which training data is scarce and requires significant amounts of pre-processing. More than the linguistic situation of Bambara itself, the socio-cultural context within which Bambara speakers live poses challenges for automated processing of this language. In this paper, we present the first parallel data set for machine translation of Bambara into and from English and French and the first benchmark results on machine translation to and from Bambara. We discuss challenges in working with low-resource languages and propose strategies to cope with data scarcity in low-resource machine translation (MT).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源