论文标题

Shanerun系统描述voxceleb扬声器识别挑战2020

ShaneRun System Description to VoxCeleb Speaker Recognition Challenge 2020

论文作者

Chen, Shen

论文摘要

在本报告中,我们描述了Shanerun团队在2020年Voxceleb演讲者识别挑战(VOXSRC)中的提交。我们使用Resnet-34作为编码器来提取说话者嵌入式,这是从开放源voxceleb-Trainer中引用的。我们还提供了一种简单的方法,可以使用T-SNE归一化的测试话语对实现最佳融合,而不是与编码器的原始负欧几里得距离。最终提交的系统对于固定数据轨道的MIDCF和5.076%的ERR获得了0.3098的ERR,这使基线的表现分别优于1.3%的MindCF和2.2%的错误。

In this report, we describe the submission of ShaneRun's team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020. We use ResNet-34 as encoder to extract the speaker embeddings, which is referenced from the open-source voxceleb-trainer. We also provide a simple method to implement optimum fusion using t-SNE normalized distance of testing utterance pairs instead of original negative Euclidean distance from the encoder. The final submitted system got 0.3098 minDCF and 5.076 % ERR for Fixed data track, which outperformed the baseline by 1.3 % minDCF and 2.2 % ERR respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源