论文标题
改善对混合ASR系统中代表性不足的实体的认识的方法
Approaches to Improving Recognition of Underrepresented Named Entities in Hybrid ASR Systems
论文作者
论文摘要
在本文中,我们提出了一系列补充方法,以提高混合ASR系统中代表性不足的命名实体(NE)的识别,而不会损害总体单词错误率性能。代表性不足的单词对应于训练数据中的稀有或少数量表(OOV)单词,因此无法可靠地建模。我们从绘画词典开始,该词典允许在混合ASR中删除语音模型的必要性。我们在不同的环境下研究它,并证明了其在处理不足的NES方面的有效性。接下来,我们研究了神经语言模型(LM)的影响,其基于字母的特征得出来处理不经常的单词。之后,我们试图通过借用丰富代表性的单词的嵌入表示,以丰富预验证的神经LM中代表性不足的表示。这使我们可以在代表性不足的NE识别方面获得显着的绩效提高。最后,我们提高了神经LMS夺回的lattices一词中包含NES的话语的可能性得分,并取得了进一步的性能提高。上述方法的组合相对可提高NE识别多达42%。
In this paper, we present a series of complementary approaches to improve the recognition of underrepresented named entities (NE) in hybrid ASR systems without compromising overall word error rate performance. The underrepresented words correspond to rare or out-of-vocabulary (OOV) words in the training data, and thereby can't be modeled reliably. We begin with graphemic lexicon which allows to drop the necessity of phonetic models in hybrid ASR. We study it under different settings and demonstrate its effectiveness in dealing with underrepresented NEs. Next, we study the impact of neural language model (LM) with letter-based features derived to handle infrequent words. After that, we attempt to enrich representations of underrepresented NEs in pretrained neural LM by borrowing the embedding representations of rich-represented words. This let us gain significant performance improvement on underrepresented NE recognition. Finally, we boost the likelihood scores of utterances containing NEs in the word lattices rescored by neural LMs and gain further performance improvement. The combination of the aforementioned approaches improves NE recognition by up to 42% relatively.