Isegan：改进的语音增强生成对抗网络

论文标题

Isegan：改进的语音增强生成对抗网络

iSEGAN: Improved Speech Enhancement Generative Adversarial Networks

论文作者

Baby, Deepak

论文摘要

流行的基于神经网络的语音增强系统在幅度谱图上运行，而忽略了嘈杂和干净的语音信号之间的相位不匹配。有条件的生成对抗网络（CGANS）通过将原始嘈杂的语音波形直接映射到基础干净的语音信号来解决相位不匹配问题方面有望。但是，稳定和训练CGAN系统很困难，它们仍然没有光谱增强方法所取得的性能。本文研究了不同的归一化策略和单方面标签平滑是否可以进一步稳定基于CGAN的语音增强模型。此外，我们建议将基于γ的听觉过滤层和可训练的预强调层加入，以进一步提高CGAN框架的性能。仿真结果表明，所提出的方法改善了CGAN系统的语音增强性能，除了产生改善的稳定性和减少的计算工作。

Popular neural network-based speech enhancement systems operate on the magnitude spectrogram and ignore the phase mismatch between the noisy and clean speech signals. Conditional generative adversarial networks (cGANs) show promise in addressing the phase mismatch problem by directly mapping the raw noisy speech waveform to the underlying clean speech signal. However, stabilizing and training cGAN systems is difficult and they still fall short of the performance achieved by the spectral enhancement approaches. This paper investigates whether different normalization strategies and one-sided label smoothing can further stabilize the cGAN-based speech enhancement model. In addition, we propose incorporating a Gammatone-based auditory filtering layer and a trainable pre-emphasis layer to further improve the performance of the cGAN framework. Simulation results show that the proposed approaches improve the speech enhancement performance of cGAN systems in addition to yielding improved stability and reduced computational effort.

下载PDF全文

下载文献需遵守相关版权规定

论文标题