论文标题
通过将DNN模型与GMM模型相结合的语音活动检测方案
Voice Activity Detection Scheme by Combining DNN Model with GMM Model
论文作者
论文摘要
由于深神经网络(DNN)具有出色的建模能力,因此广泛用于语音活动检测(VAD)。但是,如果没有足够的数据,尤其是用于培训的实用数据,则性能可能会降低,因此,适应环境的能力较低。此外,大型模型结构不能总是在实用中使用,尤其是用于使用限制硬件的低成本设备。与高斯混合模型(GMM)相反,可以实时更新模型参数,但具有较低的建模能力。在本文中,提出了将这两个模型结合在一起的深入集成方案,以提高适应性和建模能力。这是通过直接组合模型的结果并将其与DNN模型的结果相结合以更新GMM模型来完成的。此外,对控制方案的设计经过精心设计,以检测语音的终点。通过采用该方案的出色表现,可以通过实践实验来验证,这可以深入了解将监督学习和无监督学习结合的优势。
Due to the superior modeling ability of deep neural network (DNN), it is widely used in voice activity detection (VAD). However, the performance may degrade if no sufficient data especially for practical data could be used for training, thus, leading to inferior ability of adaption to environment. Moreover, large model structure could not always be used in practical, especially for low cost devices where restricted hardware is used. This is on the contrary for Gaussian mixture model (GMM) where model parameters can be updated in real-time, but, with low modeling ability. In this paper, deeply integrated scheme combining these two models are proposed to improve adaptability and modeling ability. This is done by directly combining the results of models and feeding it back, together with the result of the DNN model, to update the GMM model. Besides, a control scheme is elaborately designed to detect the endpoints of speech. The superior performance by employing this scheme is validated through experiments in practical, which give an insight into the advantage of combining supervised learning and unsupervised learning.