论文标题

迈向库尔德语的机器翻译

Towards Machine Translation for the Kurdish Language

论文作者

Ahmadi, Sina, Masoud, Mariam

论文摘要

机器翻译是使用计算机将文本从一种语言转换为另一种语言的任务。它一直是自然语言处理和计算语言学的主要任务之一,并一直在促进人类交流。库尔德语是一种印欧语,由于该语言的资源较低,因此在这个领域很少受到关注。因此,在本文中,我们正在解决为库尔德语创建机器翻译系统的主要问题,重点是Sorani方言。我们描述了适用于训练Sorani Kurdish-English翻译的神经机器翻译模型的可用稀缺平行数据。我们还讨论了库尔德语言翻译中的一些主要挑战,并演示了基本文本处理任务(例如令牌化)如何改善翻译性能。

Machine translation is the task of translating texts from one language to another using computers. It has been one of the major tasks in natural language processing and computational linguistics and has been motivating to facilitate human communication. Kurdish, an Indo-European language, has received little attention in this realm due to the language being less-resourced. Therefore, in this paper, we are addressing the main issues in creating a machine translation system for the Kurdish language, with a focus on the Sorani dialect. We describe the available scarce parallel data suitable for training a neural machine translation model for Sorani Kurdish-English translation. We also discuss some of the major challenges in Kurdish language translation and demonstrate how fundamental text processing tasks, such as tokenization, can improve translation performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源