论文标题
具有有效光谱压缩映射的两阶段全带语音增强模型
A two-stage full-band speech enhancement model with effective spectral compression mapping
论文作者
论文摘要
基于深神经网络(DNN)的直接扩展,基于宽频的宽频语音增强(SE)到全频段处理面临低频分辨率在低频范围内的挑战,这极有可能导致模型的性能恶化。在本文中,我们提出了一个可学习的光谱压缩映射(SCM),以有效地压缩高频组件,以便可以更有效地处理它们。通过这样做,该模型可以更加关注低和中间频率范围,其中大多数语音功率都集中在其中。我们首先估计了光谱幅度掩码,而是将语音转换为高信噪比(SNR)状态,而不是抑制单个网络结构中的噪声,然后使用后续模型进一步优化了预增强信号的真实和虚构掩码。我们进行全面的实验来验证该方法的功效。
The direct expansion of deep neural network (DNN) based wide-band speech enhancement (SE) to full-band processing faces the challenge of low frequency resolution in low frequency range, which would highly likely lead to deteriorated performance of the model. In this paper, we propose a learnable spectral compression mapping (SCM) to effectively compress the high frequency components so that they can be processed in a more efficient manner. By doing so, the model can pay more attention to low and middle frequency range, where most of the speech power is concentrated. Instead of suppressing noise in a single network structure, we first estimate a spectral magnitude mask, converting the speech to a high signal-to-ratio (SNR) state, and then utilize a subsequent model to further optimize the real and imaginary mask of the pre-enhanced signal. We conduct comprehensive experiments to validate the efficacy of the proposed method.