使用平衡的RNN实时演讲增强

论文标题

使用平衡的RNN实时演讲增强

Real-time speech enhancement using equilibriated RNN

论文作者

Takeuchi, Daiki, Yatabe, Kohei, Koizumi, Yuma, Oikawa, Yasuhiro, Harada, Noboru

论文摘要

我们提出了一种使用因果深神经网络〜（DNN）进行实时应用的语音增强方法。 DNN已被广泛用于估计频率〜（t-f）掩码，从而增强语音信号。一个流行的DNN结构是复发性神经网络〜（RNN），因为它有效地对语音（语音）进行了有效建模时间序列数据。特别是，长期记忆（LSTM）通常用于减轻消失/爆炸的梯度问题，这使得对RNN的训练变得困难。但是，随着减轻培训难度的价格，LSTM的参数数量增加，这需要更多的计算资源。对于实时的语音增强，最好使用较小的网络而不会失去性能。在本文中，我们建议使用平衡的复发性神经网络〜（ERNN）避免消失/爆炸梯度问题而不增加参数的数量。提出的结构是因果关系，仅需要过去的信息才能实时应用。与单向和双向LSTM网络相比，提出的方法以更少的参数实现了相似的性能。

We propose a speech enhancement method using a causal deep neural network~(DNN) for real-time applications. DNN has been widely used for estimating a time-frequency~(T-F) mask which enhances a speech signal. One popular DNN structure for that is a recurrent neural network~(RNN) owing to its capability of effectively modelling time-sequential data like speech. In particular, the long short-term memory (LSTM) is often used to alleviate the vanishing/exploding gradient problem which makes the training of an RNN difficult. However, the number of parameters of LSTM is increased as the price of mitigating the difficulty of training, which requires more computational resources. For real-time speech enhancement, it is preferable to use a smaller network without losing the performance. In this paper, we propose to use the equilibriated recurrent neural network~(ERNN) for avoiding the vanishing/exploding gradient problem without increasing the number of parameters. The proposed structure is causal, which requires only the information from the past, in order to apply it in real-time. Compared to the uni- and bi-directional LSTM networks, the proposed method achieved the similar performance with much fewer parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题