论文标题
编写小数据的RNN和FST:在旧夏威夷文字中恢复缺失字符
Composing RNNs and FSTs for Small Data: Recovering Missing Characters in Old Hawaiian Text
论文作者
论文摘要
与19世纪较旧的写作系统相反,现代夏威夷拼字法采用了长元音和震颤的角色。这些额外的字符说明了夏威夷人的大约三分之一的音素,因此包括它们对阅读理解和发音有很大的不同。但是,在手动执行时,老年和较新文本之间的音译是一项费力的任务。我们介绍了两种相关的方法来自动解决此音译问题,因为没有足够的数据来训练端到端的深度学习模型。使用有限状态传感器(FSTS)实现一种方法。另一种是一种混合深度学习方法,该方法大约与复发性神经网络(RNN)组成FST。我们发现,混合方法通过将原始问题划分为一个部分,可以用FST进行模型,然后将原始问题划分为一部分,并将其用FST和另一部分划分为另一部分,从而超过了端到端的FST,该部分可以通过对可用数据的RNN轻松解决。
In contrast to the older writing system of the 19th century, modern Hawaiian orthography employs characters for long vowels and glottal stops. These extra characters account for about one-third of the phonemes in Hawaiian, so including them makes a big difference to reading comprehension and pronunciation. However, transliterating between older and newer texts is a laborious task when performed manually. We introduce two related methods to help solve this transliteration problem automatically, given that there were not enough data to train an end-to-end deep learning model. One method is implemented, end-to-end, using finite state transducers (FSTs). The other is a hybrid deep learning approach which approximately composes an FST with a recurrent neural network (RNN). We find that the hybrid approach outperforms the end-to-end FST by partitioning the original problem into one part that can be modelled by hand, using an FST, and into another part, which is easily solved by an RNN trained on the available data.