低资源多语言语音识别的自适应激活网络

论文标题

低资源多语言语音识别的自适应激活网络

Adaptive Activation Network For Low Resource Multilingual Speech Recognition

论文作者

Luo, Jian, Wang, Jianzong, Cheng, Ning, Zheng, Zhenpeng, Xiao, Jing

论文摘要

资源自动语音识别（ASR）较低是一项有用但棘手的任务，因为深度学习ASR模型通常需要大量的培训数据。现有模型主要通过对大型源语言进行预培训并转移到低资源目标语言来建立瓶颈（BN）层。在这项工作中，我们向ASR模型的上层引入了自适应激活网络，并将不同的激活功能应用于不同的语言。我们还提出了两种训练该模型的方法：（1）跨语言学习，将激活功能从源语言转换为目标语言，（2）多语言学习，共同培训连接主义者的时间分类（CTC）每种语言的丧失以及不同语言的相关性。我们在IARPA BABEL数据集上的实验表明，我们的方法表现优于划痕训练和基于传统瓶颈功能的方法。此外，将跨语性学习和多语言学习结合在一起可以进一步提高多语言语音识别的表现。

Low resource automatic speech recognition (ASR) is a useful but thorny task, since deep learning ASR models usually need huge amounts of training data. The existing models mostly established a bottleneck (BN) layer by pre-training on a large source language, and transferring to the low resource target language. In this work, we introduced an adaptive activation network to the upper layers of ASR model, and applied different activation functions to different languages. We also proposed two approaches to train the model: (1) cross-lingual learning, replacing the activation function from source language to target language, (2) multilingual learning, jointly training the Connectionist Temporal Classification (CTC) loss of each language and the relevance of different languages. Our experiments on IARPA Babel datasets demonstrated that our approaches outperform the from-scratch training and traditional bottleneck feature based methods. In addition, combining the cross-lingual learning and multilingual learning together could further improve the performance of multilingual speech recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题