论文标题

dict-nmt:基于双语词典的NMT,用于极低的资源语言

Dict-NMT: Bilingual Dictionary based NMT for Extremely Low Resource Languages

论文作者

Kumar, Nalin, Kumar, Deepak, Mishra, Subhankar

论文摘要

神经机器翻译(NMT)模型对大型双语数据集有效。但是,现有的方法和技术表明,该模型的性能高度取决于培训数据中的示例数量。对于许多语言而言,拥有如此数量的语料库是一个牵强的梦想。我们从单语言词典探索新语言的单语扬声器中汲取灵感,我们研究了双语词典对具有极低或双语语料库的语言的适用性。在本文中,我们使用具有NMT模型的双语词典探讨方法,以改善资源极低的资源语言的翻译。我们将这项工作扩展到具有零拍摄属性的多语言系统。我们详细介绍了字典质量,培训数据集大小,语言家族等对翻译质量的影响。多种低资源测试语言的结果表明,基于双语词典的方法比基线的方法明显优势。

Neural Machine Translation (NMT) models have been effective on large bilingual datasets. However, the existing methods and techniques show that the model's performance is highly dependent on the number of examples in training data. For many languages, having such an amount of corpora is a far-fetched dream. Taking inspiration from monolingual speakers exploring new languages using bilingual dictionaries, we investigate the applicability of bilingual dictionaries for languages with extremely low, or no bilingual corpus. In this paper, we explore methods using bilingual dictionaries with an NMT model to improve translations for extremely low resource languages. We extend this work to multilingual systems, exhibiting zero-shot properties. We present a detailed analysis of the effects of the quality of dictionaries, training dataset size, language family, etc., on the translation quality. Results on multiple low-resource test languages show a clear advantage of our bilingual dictionary-based method over the baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源