实时多通道语音增强的神经光束滤波器

论文标题

实时多通道语音增强的神经光束滤波器

A Neural Beam Filter for Real-time Multi-channel Speech Enhancement

论文作者

Liu, Wenzhe, Li, Andong, Zheng, Chengshi, Li, Xiaodong

论文摘要

大多数基于学习的多通道语音增强方法着重于设计一组光束成型系数，以直接过滤麦克风收到的低信噪比信号，这阻碍了这些方法的性能。为了解决这些问题，本文设计了一个因果神经束滤波器，该因果束过滤器充分利用了光束域中的空间光谱信息。具体而言，多个光束旨在在第一阶段使用参数化的超定向光束器转向所有方向。之后，通过同时建模语音和干扰的空间和光谱可区分性来学习神经空间滤波器，以便在第二阶段提取所需的语音。最后，为了进一步抑制干扰成分，尤其是在低频下，采用残留估计模块来完善第二阶段的输出。实验结果表明，所提出的方法的表现优于基于DNS-Challenge数据集生成的多渠道语音数据集上的许多最先进的多通道方法。

Most deep learning-based multi-channel speech enhancement methods focus on designing a set of beamforming coefficients to directly filter the low signal-to-noise ratio signals received by microphones, which hinders the performance of these approaches. To handle these problems, this paper designs a causal neural beam filter that fully exploits the spatial-spectral information in the beam domain. Specifically, multiple beams are designed to steer towards all directions using a parameterized super-directive beamformer in the first stage. After that, the neural spatial filter is learned by simultaneously modeling the spatial and spectral discriminability of the speech and the interference, so as to extract the desired speech coarsely in the second stage. Finally, to further suppress the interference components especially at low frequencies, a residual estimation module is adopted to refine the output of the second stage. Experimental results demonstrate that the proposed approach outperforms many state-of-the-art multi-channel methods on the generated multi-channel speech dataset based on the DNS-Challenge dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题