论文标题
通过复发的两阶段网络工作,提高阶段感知的语音
Phase-Aware Speech Enhancement with a Recurrent Two Stage Net work
论文作者
论文摘要
我们提出了一种基于神经网络的语音增强(SE)方法,称为相动的复发两级网络(RTSN)。 RTSN是我们先前提出的两个阶段网络(TSN)框架的扩展。该TSN框架配备了增强策略(BS),该策略最初从先前的神经网络(PRI-NN)估算了多个基本预测(MBP),然后由后神经网络(后NN)汇总MBP,以获得最终预测。 TSN的表现优于各种最新方法;但是,它采用了简单的深神经网络作为pri-nn。我们发现,PRI-NN影响(以感知质量),而不是NN后的表现。因此,我们采用了长期的短期记忆复发性神经网络(LSTM-RNN)作为PRI-NN,以增加语音信号中的上下文信息使用情况。此外,尽管相位信息影响了感知质量,但TSN框架并未考虑相重建。因此,我们建议采用基于Griffin-LIM算法的相重建方法。最后,我们在相关的指标和电话识别错误率中使用基线(例如TSN)评估了RTSN。
We propose a neural network-based speech enhancement (SE) method called the phase-aware recurrent two stage network (rTSN). The rTSN is an extension of our previously proposed two stage network (TSN) framework. This TSN framework was equipped with a boosting strategy (BS) that initially estimates the multiple base predictions (MBPs) from a prior neural network (pri-NN) and then the MBPs are aggregated by a posterior neural network (post-NN) to obtain the final prediction. The TSN outperformed various state-of-the-art methods; however, it adopted the simple deep neural network as pri-NN. We have found that the pri-NN affects the performance (in perceptual quality), more than post-NN; therefore we adopted the long short-term memory recurrent neural network (LSTM-RNN) as pri-NN to boost the context information usage within speech signals. Further, the TSN framework did not consider the phase reconstruction, though phase information affected the perceptual quality. Therefore, we proposed to adopt the phase reconstruction method based on the Griffin-Lim algorithm. Finally, we evaluated rTSN with baselines such as TSN in perceptual quality related metrics as well as the phone recognition error rate.