智能神经同态综合的语音增强

论文标题

智能神经同态综合的语音增强

Speech Enhancement with Intelligent Neural Homomorphic Synthesis

论文作者

He, Shulin, Rao, Wei, Liu, Jinjiang, Chen, Jun, Ju, Yukai, Zhang, Xueliang, Wang, Yannan, Shang, Shidong

论文摘要

大多数神经网络语音增强模型通过直接绘制傅立叶变换频谱或波形来忽略语音产生数学模型。在这项工作中，我们提出了一个神经源过滤网络，以增强语音。具体来说，我们使用同态信号处理和曲线分析来获得嘈杂的语音的激发和人声。与传统的信号处理不同，我们使用细心的经常性网络（ARN）模型预测比率掩码来替换效率分离函数。然后，使用两个卷积专注的经常性网络（CARN）网络分别预测清洁语音的激发和声带。该系统的输出是根据估计的激发和声音合成的。实验证明我们提出的方法的性能更好，与FullSubNet相比，SI-SNR提高了1.363DB。

Most neural network speech enhancement models ignore speech production mathematical models by directly mapping Fourier transform spectrums or waveforms. In this work, we propose a neural source filter network for speech enhancement. Specifically, we use homomorphic signal processing and cepstral analysis to obtain noisy speech's excitation and vocal tract. Unlike traditional signal processing, we use an attentive recurrent network (ARN) model predicted ratio mask to replace the liftering separation function. Then two convolutional attentive recurrent network (CARN) networks are used to predict the excitation and vocal tract of clean speech, respectively. The system's output is synthesized from the estimated excitation and vocal. Experiments prove that our proposed method performs better, with SI-SNR improving by 1.363dB compared to FullSubNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题