多通道语音增强的渠道注意密度U-NET

论文标题

多通道语音增强的渠道注意密度U-NET

Channel-Attention Dense U-Net for Multichannel Speech Enhancement

论文作者

Tolooshams, Bahareh, Giri, Ritwik, Song, Andrew H., Isik, Umut, Krishnaswamy, Arvindh

论文摘要

最近，有监督的深度学习引起了人们对语音增强的极大关注。最先进的深度学习方法通过学习一个比率/二进制掩码来执行任务，该比例/二进制面膜可在时频域中应用于混合物以产生干净的语音。尽管在单渠道设置中表现出色，但这些框架在多通道设置中的性能滞后，因为这些方法中的大多数a）无法完全利用可用的空间信息，b）仍然将深度体系结构视为黑匣子，这可能不适合用于多键盘音频处理。本文解决了这些缺点，a）通过利用复杂比率掩盖而不是掩盖光谱图的幅度，更重要的是，b）通过在深度体系结构内引入通道注意机制来模仿光束形成。我们提出了通道注意密集的U-NET，其中我们在网络各层的特征图上递归应用了通道注意单元，从而使网络能够执行非线性波束成形。我们证明了网络与Chime-3数据集上最先进的方法的卓越性能。

Supervised deep learning has gained significant attention for speech enhancement recently. The state-of-the-art deep learning methods perform the task by learning a ratio/binary mask that is applied to the mixture in the time-frequency domain to produce the clean speech. Despite the great performance in the single-channel setting, these frameworks lag in performance in the multichannel setting as the majority of these methods a) fail to exploit the available spatial information fully, and b) still treat the deep architecture as a black box which may not be well-suited for multichannel audio processing. This paper addresses these drawbacks, a) by utilizing complex ratio masking instead of masking on the magnitude of the spectrogram, and more importantly, b) by introducing a channel-attention mechanism inside the deep architecture to mimic beamforming. We propose Channel-Attention Dense U-Net, in which we apply the channel-attention unit recursively on feature maps at every layer of the network, enabling the network to perform non-linear beamforming. We demonstrate the superior performance of the network against the state-of-the-art approaches on the CHiME-3 dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题