基于多语言语音识别系统的语音活动检测

论文标题

基于多语言语音识别系统的语音活动检测

Speech Activity Detection Based on Multilingual Speech Recognition System

论文作者

Sarfjoo, Seyyed Saeed, Madikeri, Srikanth, Motlicek, Petr

论文摘要

为了更好地建模上下文信息并提高语音活动检测（SAD）系统的概括能力，本文利用多语言自动语音识别（ASR）系统来执行SAD。使用无晶格的最大互信息（LF-MMI）损耗函数对声学模型（AM）进行序列判别训练，有效地提取了输入声学框架的上下文信息。多语言AM培训会导致噪声和语言变化的鲁棒性。最大输出后验的索引被视为框架级语音/非语音决策功能。多数投票和逻辑回归用于融合与语言有关的决策。多语言ASR经过了18种Babel数据集的语言培训，并在3种不同的语言上评估了内置的SAD。在室外数据集上，提出的SAD模型在基线模型方面表现出明显更好的性能。在ester2数据集上，在不使用任何域内数据的情况下，该模型在检测误差率（DETER）度量中优于WEBRTC，基于音素识别器的VAD（PHN REC）和Pyannote Baseline（分别为7.1、1.7和2.7％绝对）。同样，在LiveATC数据集上，该模型在DITER指标中的表现优于WEBRTC，PHN REC和PYANNOTE基准（分别为6.4、10.0和3.7％）。

To better model the contextual information and increase the generalization ability of Speech Activity Detection (SAD) system, this paper leverages a multi-lingual Automatic Speech Recognition (ASR) system to perform SAD. Sequence discriminative training of Acoustic Model (AM) using Lattice-Free Maximum Mutual Information (LF-MMI) loss function, effectively extracts the contextual information of the input acoustic frame. Multi-lingual AM training, causes the robustness to noise and language variabilities. The index of maximum output posterior is considered as a frame-level speech/non-speech decision function. Majority voting and logistic regression are applied to fuse the language-dependent decisions. The multi-lingual ASR is trained on 18 languages of BABEL datasets and the built SAD is evaluated on 3 different languages. On out-of-domain datasets, the proposed SAD model shows significantly better performance with respect to baseline models. On the Ester2 dataset, without using any in-domain data, this model outperforms the WebRTC, phoneme recognizer based VAD (Phn Rec), and Pyannote baselines (respectively by 7.1, 1.7, and 2.7% absolute) in Detection Error Rate (DetER) metrics. Similarly, on the LiveATC dataset, this model outperforms the WebRTC, Phn Rec, and Pyannote baselines (respectively by 6.4, 10.0, and 3.7% absolutely) in DetER metrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题