论文标题
基于传感器的语言嵌入口语识别
Transducer-based language embedding for spoken language identification
论文作者
论文摘要
声学和语言特征是口语识别(LID)任务的重要提示。最近的高级盖系统主要使用缺乏明确语言特征编码的声学特征。在本文中,我们通过将RNN传感器模型集成到语言嵌入框架中,提出了一种基于换能器的语言嵌入方法,用于盖子任务。从RNN传感器的语言表示能力的优势中受益,该提出的方法可以利用语音感知的声学特征和盖子任务的明确语言特征。实验是在大规模的多语言Librispeech和Voxlingua107数据集上进行的。实验结果表明,所提出的方法显着提高了盖子任务的性能,分别对内域和跨域数据集的相对改善为12%至59%和16%至24%。
The acoustic and linguistic features are important cues for the spoken language identification (LID) task. Recent advanced LID systems mainly use acoustic features that lack the usage of explicit linguistic feature encoding. In this paper, we propose a novel transducer-based language embedding approach for LID tasks by integrating an RNN transducer model into a language embedding framework. Benefiting from the advantages of the RNN transducer's linguistic representation capability, the proposed method can exploit both phonetically-aware acoustic features and explicit linguistic features for LID tasks. Experiments were carried out on the large-scale multilingual LibriSpeech and VoxLingua107 datasets. Experimental results showed the proposed method significantly improves the performance on LID tasks with 12% to 59% and 16% to 24% relative improvement on in-domain and cross-domain datasets, respectively.