论文标题
多域语音识别的实用框架和神经语言建模的实例抽样方法
A practical framework for multi-domain speech recognition and an instance sampling method to neural language modeling
论文作者
论文摘要
通常需要在智能手机或车辆上使用的自动语音识别(ASR)系统来处理来自非常不同域的语音查询。在这种情况下,香草ASR系统通常无法在每个域上表现良好。本文提出了用于腾讯地图的多域ASR框架,这是一个用于智能手机和车载信息娱乐系统的导航应用程序。所提出的框架由三个核心部分组成:一个基本的ASR模块,用于生成语音查询的n-最佳列表,文本分类模块,以确定语音查询属于哪个域,以及使用域特定语言模型来重新恢复n-pest列表的reranking模块。此外,提出了一种基于实例抽样的培训神经网络语言模型(NNLMS)的方法,以解决多域ASR中的数据不平衡问题。在实验中,提出的框架是在导航域和音乐域上评估的,因为导航和播放音乐是腾讯地图的两个主要特征。与一般的ASR系统相比,所提出的框架在从Tencent Map和我们的车内语音助手那里收集的几个测试集上,相对13%$ \ sim $ 22%的字符错误率。
Automatic speech recognition (ASR) systems used on smart phones or vehicles are usually required to process speech queries from very different domains. In such situations, a vanilla ASR system usually fails to perform well on every domain. This paper proposes a multi-domain ASR framework for Tencent Map, a navigation app used on smart phones and in-vehicle infotainment systems. The proposed framework consists of three core parts: a basic ASR module to generate n-best lists of a speech query, a text classification module to determine which domain the speech query belongs to, and a reranking module to rescore n-best lists using domain-specific language models. In addition, an instance sampling based method to training neural network language models (NNLMs) is proposed to address the data imbalance problem in multi-domain ASR. In experiments, the proposed framework was evaluated on navigation domain and music domain, since navigating and playing music are two main features of Tencent Map. Compared to a general ASR system, the proposed framework achieves a relative 13% $\sim$ 22% character error rate reduction on several test sets collected from Tencent Map and our in-car voice assistant.