少更多：使用有限的标签上下文和路径合并改进的RNN-T解码

论文标题

少更多：使用有限的标签上下文和路径合并改进的RNN-T解码

Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging

论文作者

Prabhavalkar, Rohit, He, Yanzhang, Rybach, David, Campbell, Sean, Narayanan, Arun, Strohman, Trevor, Sainath, Tara N.

论文摘要

在所有先前预测的标签上都可以调节输出标签序列的端到端模型已成为自动语音识别（ASR）的常规系统的流行替代品。由于唯一的标签历史对应于不同的模型状态，因此使用近似梁搜索过程来解码这样的模型，该过程产生了假设树。在这项工作中，我们研究了标签上下文对模型准确性的影响及其对解码过程效率的影响。我们发现，在训练期间，我们可以将复发性神经网络传感器（RNN-T）的上下文限制为以前的四个单词式标签，而不会相对于全文基线而不会降低单词错误率（WER）。限制环境还提供了通过从活动梁中删除冗余路径，并将其保留在最终晶格中的冗余路径来提高梁搜索过程效率的机会。当通过近似值解码基线全文模型时，也可以应用此路径合并方案。总体而言，我们发现所提出的路径合并方案非常有效，使我们能够比基线提高36％的甲骨文，同时将模型评估的数量减少高达5.3％，而不会在WER中降低任何降解。

End-to-end models that condition the output label sequence on all previously predicted labels have emerged as popular alternatives to conventional systems for automatic speech recognition (ASR). Since unique label histories correspond to distinct models states, such models are decoded using an approximate beam-search process which produces a tree of hypotheses. In this work, we study the influence of the amount of label context on the model's accuracy, and its impact on the efficiency of the decoding process. We find that we can limit the context of the recurrent neural network transducer (RNN-T) during training to just four previous word-piece labels, without degrading word error rate (WER) relative to the full-context baseline. Limiting context also provides opportunities to improve the efficiency of the beam-search process during decoding by removing redundant paths from the active beam, and instead retaining them in the final lattice. This path-merging scheme can also be applied when decoding the baseline full-context model through an approximation. Overall, we find that the proposed path-merging scheme is extremely effective allowing us to improve oracle WERs by up to 36% over the baseline, while simultaneously reducing the number of model evaluations by up to 5.3% without any degradation in WER.

下载PDF全文

下载文献需遵守相关版权规定

论文标题