论文标题
SE和ASR联合培训的多个信心门
Multiple Confidence Gates For Joint Training Of SE And ASR
论文作者
论文摘要
语音增强模型(SE)和语音识别模型(ASR)的联合培训是在嘈杂环境中强大的ASR的常见解决方案。 SE专注于提高语音的听觉质量,但是更改了功能分布,这对ASR不确定和有害。为了应对这一挑战,提出了一种具有多个信心门的方法,以共同对SE和ASR进行联合训练。语音置信门预测模块旨在替代联合训练中的前SE模块。嘈杂的语音被门过滤,以获得ASR网络更容易拟合的功能。实验结果表明,在干净的言语,合成的嘈杂语音和真正的嘈杂语音的测试集上,所提出的方法的性能比传统的强大语音识别系统更好。
Joint training of speech enhancement model (SE) and speech recognition model (ASR) is a common solution for robust ASR in noisy environments. SE focuses on improving the auditory quality of speech, but the enhanced feature distribution is changed, which is uncertain and detrimental to the ASR. To tackle this challenge, an approach with multiple confidence gates for jointly training of SE and ASR is proposed. A speech confidence gates prediction module is designed to replace the former SE module in joint training. The noisy speech is filtered by gates to obtain features that are easier to be fitting by the ASR network. The experimental results show that the proposed method has better performance than the traditional robust speech recognition system on test sets of clean speech, synthesized noisy speech, and real noisy speech.