通过转移学习和语言模型解码改善非本地英语的自动语音识别

论文标题

通过转移学习和语言模型解码改善非本地英语的自动语音识别

Improving Automatic Speech Recognition for Non-Native English with Transfer Learning and Language Model Decoding

论文作者

Sullivan, Peter, Shibano, Toshiko, Abdul-Mageed, Muhammad

论文摘要

为本地英语（L1）设计的ASR系统通常在非母语英语（L2）上表现不佳。为了解决此性能差距，\ textbf {（i）}我们扩展了先前的工作，以调查预先训练的WAV2VEC 2.0模型\ cite {Baevski2020wav2vec，XU2021Self}在L1和L2训练条件下的丰富集中。我们将进一步\ textbf {（ii）}将语言模型与微调方法一起包含在ASR系统中。分别从这两种方法中获得的量化收益，并且错误分析使我们能够确定模型中不同的改进来源。我们发现，尽管大型自我训练的WAV2VEC 2.0可以将足够的解码知识内化，以清洁L1语音\ cite {XU2021 self}，但这对L2语音并不符合使用L2数据使用语言模型的实用性。

ASR systems designed for native English (L1) usually underperform on non-native English (L2). To address this performance gap, \textbf{(i)} we extend our previous work to investigate fine-tuning of a pre-trained wav2vec 2.0 model \cite{baevski2020wav2vec,xu2021self} under a rich set of L1 and L2 training conditions. We further \textbf{(ii)} incorporate language model decoding in the ASR system, along with the fine-tuning method. Quantifying gains acquired from each of these two approaches separately and an error analysis allows us to identify different sources of improvement within our models. We find that while the large self-trained wav2vec 2.0 may be internalizing sufficient decoding knowledge for clean L1 speech \cite{xu2021self}, this does not hold for L2 speech and accounts for the utility of employing language model decoding on L2 data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题