论文标题

BASAA的音调预测和拼写转换

Tone prediction and orthographic conversion for Basaa

论文作者

Nikitin, Ilya, O'Connor, Brian, Safonova, Anastasia

论文摘要

在本文中,我们提出了一种将传教士Basaa拼字法译成正式拼字法的SEQ2SEQ方法。我们的模型使用BERT使用预先训练的BASAA传教士和官方拼字传教士。由于BASAA是一种低资源语言,因此我们决定为我们的项目使用MT5模型。在训练模型之前,我们通过消除拼写和统一字符之间的一对多信件来预处理我们的语料库,这些对应关系可变地包含一个一到两个字符的单字符形式。我们最好的MT5模型达到的CER等于12.6747,而WER等于40.1012。

In this paper, we present a seq2seq approach for transliterating missionary Basaa orthographies into the official orthography. Our model uses pre-trained Basaa missionary and official orthography corpora using BERT. Since Basaa is a low-resource language, we have decided to use the mT5 model for our project. Before training our model, we pre-processed our corpora by eliminating one-to-one correspondences between spellings and unifying characters variably containing either one to two characters into single-character form. Our best mT5 model achieved a CER equal to 12.6747 and a WER equal to 40.1012.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源