论文标题

ECAPA-TDNN:在基于TDNN的扬声器验证中强调渠道的关注,传播和聚集

ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification

论文作者

Desplanques, Brecht, Thienpondt, Jenthe, Demuynck, Kris

论文摘要

当前的说话者验证技术依靠神经网络来提取说话者表示。成功的X-vector体系结构是一个时间延迟神经网络(TDNN),将统计信息汇总到项目可变长度的话语中,以表征嵌入式的固定长度扬声器。在本文中,我们根据面部验证和计算机视觉相关领域的最新趋势对这种体系结构提出了多种增强。首先,可以将初始帧层重组为具有影响力的跳过连接的1维RES2NET模块。与SE-RESNET类似,我们在这些模块中引入挤压和兴奋块,以明确模型通道相互依存关系。 SE块通过根据记录的全局属性重新缩放通道来扩展框架层的时间上下文。其次,已知神经网络学习分层特征,每个层都在不同的复杂程度上运行。为了利用这些互补信息,我们汇总和传播不同层次级别的特征。最后,我们以依赖通道的框架注意来改善统计池池模块。这使网络能够在每个频道的统计估算中关注不同的帧子集。拟议的ECAPA-TDNN体系结构在Voxceleb测试集和2019年Voxceleb扬声器识别挑战赛上的最先进的系统胜过最先进的系统。

Current speaker verification techniques rely on a neural network to extract speaker representations. The successful x-vector architecture is a Time Delay Neural Network (TDNN) that applies statistics pooling to project variable-length utterances into fixed-length speaker characterizing embeddings. In this paper, we propose multiple enhancements to this architecture based on recent trends in the related fields of face verification and computer vision. Firstly, the initial frame layers can be restructured into 1-dimensional Res2Net modules with impactful skip connections. Similarly to SE-ResNet, we introduce Squeeze-and-Excitation blocks in these modules to explicitly model channel interdependencies. The SE block expands the temporal context of the frame layer by rescaling the channels according to global properties of the recording. Secondly, neural networks are known to learn hierarchical features, with each layer operating on a different level of complexity. To leverage this complementary information, we aggregate and propagate features of different hierarchical levels. Finally, we improve the statistics pooling module with channel-dependent frame attention. This enables the network to focus on different subsets of frames during each of the channel's statistics estimation. The proposed ECAPA-TDNN architecture significantly outperforms state-of-the-art TDNN based systems on the VoxCeleb test sets and the 2019 VoxCeleb Speaker Recognition Challenge.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源