轻巧的全乐队演讲增强模型

论文标题

轻巧的全乐队演讲增强模型

A light-weight full-band speech enhancement model

论文作者

Hu, Qinwen, Hou, Zhongshu, Le, Xiaohuai, Lu, Jing

论文摘要

深度神经网络基于全频段的语音增强系统面临着计算资源高需求和频率分布不平衡的挑战。在本文中，提出了一个轻巧的全带模型，该模型采用两种专用策略，即可学习的光谱压缩映射，以进行更有效的高频光谱信息压缩，以及利用多头注意力机制以更有效的全球光谱模式。实验验证了所提出的策略的功效，并表明所提出的模型仅通过参数为89万参数实现竞争性能。

Deep neural network based full-band speech enhancement systems face challenges of high demand of computational resources and imbalanced frequency distribution. In this paper, a light-weight full-band model is proposed with two dedicated strategies, i.e., a learnable spectral compression mapping for more effective high-band spectral information compression, and the utilization of the multi-head attention mechanism for more effective modeling of the global spectral pattern. Experiments validate the efficacy of the proposed strategies and show that the proposed model achieves competitive performance with only 0.89M parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题