论文标题
在卷积网络的变体中以双路径策略嵌入复发层
Embedding Recurrent Layers with Dual-Path Strategy in a Variant of Convolutional Network for Speaker-Independent Speech Separation
论文作者
论文摘要
近年来,随着深度神经网络(DNN)的发展,与说话者无关的语音分离取得了出色的表现。从传统的卷积神经网络(CNN)和经常性神经网络(RNN)到高级变压器的各种网络体系结构已被精心设计,以提高分离性能。但是,最先进的模型通常会遭受与计算相关的几个缺陷,例如大型模型大小,巨大的内存消耗和计算复杂性。为了找到性能和计算效率之间的平衡并进一步探索传统网络结构的建模能力,我们将RNN和新提出的卷积网络变体结合在一起,以应对语音分离问题。通过在双路策略的帮助下将两个RNN嵌入该变体的基本块中,提出的网络可以有效地学习本地信息和全球依赖性。此外,四个阶段的结构可以随着特征维度的增加,可以在更精细的尺度下逐渐执行分离过程。各种数据集的实验结果证明了该方法的有效性,并表明分离性能与计算效率之间的权衡取得了很好的实现。
Speaker-independent speech separation has achieved remarkable performance in recent years with the development of deep neural network (DNN). Various network architectures, from traditional convolutional neural network (CNN) and recurrent neural network (RNN) to advanced transformer, have been designed sophistically to improve separation performance. However, the state-of-the-art models usually suffer from several flaws related to the computation, such as large model size, huge memory consumption and computational complexity. To find the balance between the performance and computational efficiency and to further explore the modeling ability of traditional network structure, we combine RNN and a newly proposed variant of convolutional network to cope with speech separation problem. By embedding two RNNs into basic block of this variant with the help of dual-path strategy, the proposed network can effectively learn the local information and global dependency. Besides, a four-staged structure enables the separation procedure to be performed gradually at finer and finer scales as the feature dimension increases. The experimental results on various datasets have proven the effectiveness of the proposed method and shown that a trade-off between the separation performance and computational efficiency is well achieved.