论文标题

探索语音分离的自我发项机制

Exploring Self-Attention Mechanisms for Speech Separation

论文作者

Subakan, Cem, Ravanelli, Mirco, Cornell, Samuele, Grondin, Francois, Bronzi, Mirko

论文摘要

变形金刚在深度学习方面取得了令人印象深刻的改进。在利用并行处理的同时,在许多任务中,他们通常在许多任务中的反复和卷积模型经常超越。最近,我们提出了SEPFORMER,它使用WSJ0-2/3混合数据集获得了语音分离中的最新性能。本文研究了言语分离的深入变压器。特别是,我们通过提供更具挑战性的嘈杂和嘈杂的革命数据集(例如librimix,wham!和whamr!)来扩展了以前的发现。此外,我们扩展了模型以执行语音增强,并提供了有关降级和覆盖任务的实验证据。最后,我们首次在语音分离中调查,使用有效的自我注意事项机制,例如线形,外缘人和改革者。我们发现它们大大降低了内存需求。例如,我们表明,基于改革者的注意力优于WSJ0-2MIX数据集上流行的Conv-TASNET模型,同时推理速度更快,并且在记忆消耗方面可比性。

Transformers have enabled impressive improvements in deep learning. They often outperform recurrent and convolutional models in many tasks while taking advantage of parallel processing. Recently, we proposed the SepFormer, which obtains state-of-the-art performance in speech separation with the WSJ0-2/3 Mix datasets. This paper studies in-depth Transformers for speech separation. In particular, we extend our previous findings on the SepFormer by providing results on more challenging noisy and noisy-reverberant datasets, such as LibriMix, WHAM!, and WHAMR!. Moreover, we extend our model to perform speech enhancement and provide experimental evidence on denoising and dereverberation tasks. Finally, we investigate, for the first time in speech separation, the use of efficient self-attention mechanisms such as Linformers, Lonformers, and ReFormers. We found that they reduce memory requirements significantly. For example, we show that the Reformer-based attention outperforms the popular Conv-TasNet model on the WSJ0-2Mix dataset while being faster at inference and comparable in terms of memory consumption.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源