ainu民间传说和端到端语言识别ainu语言的语料库

论文标题

ainu民间传说和端到端语言识别ainu语言的语料库

Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language

论文作者

Matsuura, Kohei, Ueno, Sei, Mimura, Masato, Sakai, Shinsuke, Kawahara, Tatsuya

论文摘要

Ainu是一种不成文的语言，是日本族裔之一的Ainu人使用的语言。它被联合国教科文组织的严重危害，其语言遗产的归档和记录至关重要。尽管已经制作并积累了大量的Ainu民俗语音录音来拯救他们的文化，但到目前为止，其中只有相当有限的部分被转录。因此，我们为AINU语言启动了一个自动语音识别项目（ASR），以促进注释语言档案的发展。在本文中，我们报告了语料库的发展以及Ainu端到端ASR的结构和性能。我们研究了四个建模单元（电话，音节，文字和字），发现基于音节的模型在单词和电话识别精度方面表现最好，分别在扬声器开放状态下分别为60％和85％以上。此外，在扬声器关闭的环境中，已经达到了80％和90％的单词和电话精度。我们还发现，具有其他英语和日语语音语料库的多语言ASR培训进一步提高了扬声器打开的测试准确性。

Ainu is an unwritten language that has been spoken by Ainu people who are one of the ethnic groups in Japan. It is recognized as critically endangered by UNESCO and archiving and documentation of its language heritage is of paramount importance. Although a considerable amount of voice recordings of Ainu folklore has been produced and accumulated to save their culture, only a quite limited parts of them are transcribed so far. Thus, we started a project of automatic speech recognition (ASR) for the Ainu language in order to contribute to the development of annotated language archives. In this paper, we report speech corpus development and the structure and performance of end-to-end ASR for Ainu. We investigated four modeling units (phone, syllable, word piece, and word) and found that the syllable-based model performed best in terms of both word and phone recognition accuracy, which were about 60% and over 85% respectively in speaker-open condition. Furthermore, word and phone accuracy of 80% and 90% has been achieved in a speaker-closed setting. We also found out that a multilingual ASR training with additional speech corpora of English and Japanese further improves the speaker-open test accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题