论文标题

基于语言得分的多语言语音识别方法

A language score based output selection method for multilingual speech recognition

论文作者

Nguyen, Van Huy, Dinh, Thi Quynh Khanh, Nguyen, Truong Thinh, Mac, Dang Khoa

论文摘要

如果指定输入语言,则可以通过适应方法提高多语言语音识别系统的质量。对于可以接受多语言输入的系统,流行的方法是将语言标识符应用于输入,然后在下一步中切换或配置解码器,或使用更多的子序列模型从一组候选人中选择输出。在本文中,由减少实时应用程序的延迟的目标的动机,首先采用语言模型重新纠正方法来生产目标语言的所有可能的候选者,然后提出一个简单的分数来自动选择输出,而无需任何标识符模型或输入语言的语言规范。要点是,该分数可以简单地自动估计,以便整个解码管道更简单和紧凑。实验结果表明,此方法可以达到与指定输入语言时相同的质量。此外,我们展示了设计英语和越南端到端模型,不仅要处理跨语言扬声器的问题,而且还解决了提高越南语英语单词准确性的解决方案。

The quality of a multilingual speech recognition system can be improved by adaptation methods if the input language is specified. For systems that can accept multilingual inputs, the popular approach is to apply a language identifier to the input then switch or configure decoders in the next step, or use one more subsequence model to select the output from a set of candidates. Motivated by the goal of reducing the latency for real-time applications, in this paper, a language model rescoring method is firstly applied to produce all possible candidates for target languages, then a simple score is proposed to automatically select the output without any identifier model or language specification of the input language. The main point is that this score can be simply and automatically estimated on-the-fly so that the whole decoding pipeline is more simple and compact. Experimental results showed that this method can achieve the same quality as when the input language is specified. In addition, we present to design an English and Vietnamese End-to-End model to deal with not only the problem of cross-lingual speakers but also as a solution to improve the accuracy of borrowed words of English in Vietnamese.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源