论文标题
部分可观测时空混沌系统的无模型预测
Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems
论文作者
论文摘要
上下文偏见是端到端自动语音识别(ASR)系统的一项重要且具有挑战性的任务,该系统旨在通过将ASR系统偏向于特定上下文短语,例如人名称,音乐列表,适当的名词等来实现更好的识别性能。现有方法主要包括上下文的LM偏置和将偏见编码添加到最终的ASR中。在这项工作中,我们介绍了一种新颖的方法,通过在端到端ASR系统之上添加上下文拼写校正模型来进行上下文偏见。我们将上下文信息与共享上下文编码器结合到序列到序列拼写校正模型中。我们提出的模型包括两种不同的机制:自回旋(AR)和非自动回旋(NAR)。我们提出过滤算法来处理大尺寸的上下文列表,以及性能平衡机制,以控制模型的偏置程度。我们证明了所提出的模型是一种通用的偏见解决方案,它是对领域不敏感的,可以在不同的情况下采用。实验表明,所提出的方法在ASR系统上的相对单词错误率(WER)降低多达51%,并且表现优于传统偏见方法。与AR溶液相比,所提出的NAR模型可将模型尺寸降低43.2%,并将推断加速2.1倍。
Contextual biasing is an important and challenging task for end-to-end automatic speech recognition (ASR) systems, which aims to achieve better recognition performance by biasing the ASR system to particular context phrases such as person names, music list, proper nouns, etc. Existing methods mainly include contextual LM biasing and adding bias encoder into end-to-end ASR models. In this work, we introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system. We incorporate contextual information into a sequence-to-sequence spelling correction model with a shared context encoder. Our proposed model includes two different mechanisms: autoregressive (AR) and non-autoregressive (NAR). We propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree of the model. We demonstrate the proposed model is a general biasing solution which is domain-insensitive and can be adopted in different scenarios. Experiments show that the proposed method achieves as much as 51% relative word error rate (WER) reduction over ASR system and outperforms traditional biasing methods. Compared to the AR solution, the proposed NAR model reduces model size by 43.2% and speeds up inference by 2.1 times.