论文标题

多频信息增强了扬声器表示学习的频道注意模块

Multi-Frequency Information Enhanced Channel Attention Module for Speaker Representation Learning

论文作者

Sang, Mufan, Hansen, John H. L.

论文摘要

最近,注意机制已成功应用于基于神经网络的说话者验证系统。将挤压和兴奋块纳入卷积神经网络中的表现出色。但是,它使用全球平均池(GAP)简单地沿时间和频率维度平均,这是在功能地图中保留足够的扬声器信息的能力。在这项研究中,我们表明GAP是在数学上仅使用频率分解中最低频率分量的时间频域上离散余弦变换(DCT)的特殊情况。为了增强扬声器信息提取能力,我们建议利用多频信息,并设计两个新颖而有效的注意模块,称为单频率单通道(SFSC)注意模块和多频单通道(MFSC)注意模块。提出的注意模块可以根据DCT有效地从多个频率组件中捕获更多扬声器信息。我们在Voxceleb数据集上进行了全面的实验,并对第148个UTD法医语料库进行了探测评估。实验结果表明,我们提出的SFSC和MFSC注意模块可以有效地产生更具歧视性的扬声器表示,并且优于RESNET34-SE和ECAPA-TDNN系统,而EER降低了20.9%和20.2%,而无需添加额外的网络参数。

Recently, attention mechanisms have been applied successfully in neural network-based speaker verification systems. Incorporating the Squeeze-and-Excitation block into convolutional neural networks has achieved remarkable performance. However, it uses global average pooling (GAP) to simply average the features along time and frequency dimensions, which is incapable of preserving sufficient speaker information in the feature maps. In this study, we show that GAP is a special case of a discrete cosine transform (DCT) on time-frequency domain mathematically using only the lowest frequency component in frequency decomposition. To strengthen the speaker information extraction ability, we propose to utilize multi-frequency information and design two novel and effective attention modules, called Single-Frequency Single-Channel (SFSC) attention module and Multi-Frequency Single-Channel (MFSC) attention module. The proposed attention modules can effectively capture more speaker information from multiple frequency components on the basis of DCT. We conduct comprehensive experiments on the VoxCeleb datasets and a probe evaluation on the 1st 48-UTD forensic corpus. Experimental results demonstrate that our proposed SFSC and MFSC attention modules can efficiently generate more discriminative speaker representations and outperform ResNet34-SE and ECAPA-TDNN systems with relative 20.9% and 20.2% reduction in EER, without adding extra network parameters.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源