深度时间延迟神经网络，用于通过完整数据学习来增强语音

论文标题

深度时间延迟神经网络，用于通过完整数据学习来增强语音

Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning

论文作者

Fan, Cunhang, Liu, Bin, Tao, Jianhua, Yi, Jiangyan, Wen, Zhengqi, Song, Leichao

论文摘要

复发性神经网络（RNN）近年来表现出可增强语音的显着改善。但是，RNN的模型复杂性和推理时间成本高于深馈神经网络（DNNS）。因此，这些限制了语音增强的应用。本文提出了一个深度延迟神经网络（TDNN），以通过完整的数据学习来增强语音。 TDNN具有捕获远距离时间上下文的极大潜力，该环境利用模块化和增量设计。此外，TDNN保留了前馈结构，因此其推断成本与标准DNN相当。为了充分利用培训数据，我们提出了一种完整的数据学习方法，以增强语音。更具体地说，我们不仅使用嘈杂的清洁（输入到目标）来训练增强型模型，还可以使用清洁到清洁和噪声到噪声数据。因此，所有培训数据都可以用来训练增强模型。我们的实验是在Timit数据集上进行的。实验结果表明，我们提出的方法比DNN可以取得更好的性能，并且比BLSTM更好的性能甚至更好。同时，与BLSTM相比，提出的方法大大减少了推理时间。

Recurrent neural networks (RNNs) have shown significant improvements in recent years for speech enhancement. However, the model complexity and inference time cost of RNNs are much higher than deep feed-forward neural networks (DNNs). Therefore, these limit the applications of speech enhancement. This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning. The TDNN has excellent potential for capturing long range temporal contexts, which utilizes a modular and incremental design. Besides, the TDNN preserves the feed-forward structure so that its inference cost is comparable to standard DNN. To make full use of the training data, we propose a full data learning method for speech enhancement. More specifically, we not only use the noisy-to-clean (input-to-target) to train the enhanced model, but also the clean-to-clean and noise-to-silence data. Therefore, all of the training data can be used to train the enhanced model. Our experiments are conducted on TIMIT dataset. Experimental results show that our proposed method could achieve a better performance than DNN and comparable even better performance than BLSTM. Meanwhile, compared with the BLSTM, the proposed method drastically reduce the inference time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题