多头单调的块状注意在线语音识别

论文标题

多头单调的块状注意在线语音识别

Multi-head Monotonic Chunkwise Attention For Online Speech Recognition

论文作者

Liu, Baiji, Cao, Songjun, Sun, Sining, Zhang, Weibin, Ma, Long

论文摘要

聆听，参加和咒语（LAS）模型的注意力机制需要整个输入序列来计算注意力环境，因此不适合在线语音识别。为了解决这个问题，我们提出了多头单调块的关注（MTH-Mocha），这是Mocha的改进版本。 mth-mocha将输入序列分为小块，并在块上计算多头注意力。我们还探索了有用的培训策略，例如LSTM合并，最低世界错误率培训和规格，以进一步提高MTH-Mocha的性能。 Aishell-1数据的实验表明，拟议的模型以及训练策略将摩卡咖啡的特征错误率（CER）从测试集上的8.96％提高到7.68％。在另一个18000小时的车内语音数据集中，MTH-Mocha获得了7.28％的CER，这比最先进的混合动力系统要好得多。

The attention mechanism of the Listen, Attend and Spell (LAS) model requires the whole input sequence to calculate the attention context and thus is not suitable for online speech recognition. To deal with this problem, we propose multi-head monotonic chunk-wise attention (MTH-MoChA), an improved version of MoChA. MTH-MoChA splits the input sequence into small chunks and computes multi-head attentions over the chunks. We also explore useful training strategies such as LSTM pooling, minimum world error rate training and SpecAugment to further improve the performance of MTH-MoChA. Experiments on AISHELL-1 data show that the proposed model, along with the training strategies, improve the character error rate (CER) of MoChA from 8.96% to 7.68% on test set. On another 18000 hours in-car speech data set, MTH-MoChA obtains 7.28% CER, which is significantly better than a state-of-the-art hybrid system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题