论文标题
Shanerun系统描述voxceleb扬声器识别挑战2020
ShaneRun System Description to VoxCeleb Speaker Recognition Challenge 2020
论文作者
论文摘要
在本报告中,我们描述了Shanerun团队在2020年Voxceleb演讲者识别挑战(VOXSRC)中的提交。我们使用Resnet-34作为编码器来提取说话者嵌入式,这是从开放源voxceleb-Trainer中引用的。我们还提供了一种简单的方法,可以使用T-SNE归一化的测试话语对实现最佳融合,而不是与编码器的原始负欧几里得距离。最终提交的系统对于固定数据轨道的MIDCF和5.076%的ERR获得了0.3098的ERR,这使基线的表现分别优于1.3%的MindCF和2.2%的错误。
In this report, we describe the submission of ShaneRun's team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020. We use ResNet-34 as encoder to extract the speaker embeddings, which is referenced from the open-source voxceleb-trainer. We also provide a simple method to implement optimum fusion using t-SNE normalized distance of testing utterance pairs instead of original negative Euclidean distance from the encoder. The final submitted system got 0.3098 minDCF and 5.076 % ERR for Fixed data track, which outperformed the baseline by 1.3 % minDCF and 2.2 % ERR respectively.