论文标题
使用时间延迟神经网络自动口语识别
Automatic Spoken Language Identification using a Time-Delay Neural Network
论文作者
论文摘要
封闭式口语标识是识别从一组已知语言中录制的音频剪辑中使用的语言的任务。在这项研究中,建立和训练了语言识别系统,以根据录制的语音来区分阿拉伯语,西班牙语,法语和土耳其语。先前存在的多语言数据集用于训练基于TEDLIUM TDNN模型的一系列声学模型,以执行自动语音识别。该系统提供了一种自定义的多语言语言模型和专门的发音词典,并带有语言名称为手机。训练有素的模型用于生成电话对齐方式,以测试所有四种语言的数据,并根据一项投票方案预测语言,该方案选择了话语中最常见的语言。通过将预测的语言与已知语言进行比较来衡量准确性,并被确定在识别西班牙语和阿拉伯语方面非常高,并且在识别土耳其语和法语方面较低。
Closed-set spoken language identification is the task of recognizing the language being spoken in a recorded audio clip from a set of known languages. In this study, a language identification system was built and trained to distinguish between Arabic, Spanish, French, and Turkish based on nothing more than recorded speech. A pre-existing multilingual dataset was used to train a series of acoustic models based on the Tedlium TDNN model to perform automatic speech recognition. The system was provided with a custom multilingual language model and a specialized pronunciation lexicon with language names prepended to phones. The trained model was used to generate phone alignments to test data from all four languages, and languages were predicted based on a voting scheme choosing the most common language prepend in an utterance. Accuracy was measured by comparing predicted languages to known languages, and was determined to be very high in identifying Spanish and Arabic, and somewhat lower in identifying Turkish and French.