基于传感器的语言嵌入口语识别

论文标题

基于传感器的语言嵌入口语识别

Transducer-based language embedding for spoken language identification

论文作者

Shen, Peng, Lu, Xugang, Kawai, Hisashi

论文摘要

声学和语言特征是口语识别（LID）任务的重要提示。最近的高级盖系统主要使用缺乏明确语言特征编码的声学特征。在本文中，我们通过将RNN传感器模型集成到语言嵌入框架中，提出了一种基于换能器的语言嵌入方法，用于盖子任务。从RNN传感器的语言表示能力的优势中受益，该提出的方法可以利用语音感知的声学特征和盖子任务的明确语言特征。实验是在大规模的多语言Librispeech和Voxlingua107数据集上进行的。实验结果表明，所提出的方法显着提高了盖子任务的性能，分别对内域和跨域数据集的相对改善为12％至59％和16％至24％。

The acoustic and linguistic features are important cues for the spoken language identification (LID) task. Recent advanced LID systems mainly use acoustic features that lack the usage of explicit linguistic feature encoding. In this paper, we propose a novel transducer-based language embedding approach for LID tasks by integrating an RNN transducer model into a language embedding framework. Benefiting from the advantages of the RNN transducer's linguistic representation capability, the proposed method can exploit both phonetically-aware acoustic features and explicit linguistic features for LID tasks. Experiments were carried out on the large-scale multilingual LibriSpeech and VoxLingua107 datasets. Experimental results showed the proposed method significantly improves the performance on LID tasks with 12% to 59% and 16% to 24% relative improvement on in-domain and cross-domain datasets, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题