Shanerun系统描述voxceleb扬声器识别挑战2020

论文标题

Shanerun系统描述voxceleb扬声器识别挑战2020

ShaneRun System Description to VoxCeleb Speaker Recognition Challenge 2020

论文作者

Chen, Shen

论文摘要

在本报告中，我们描述了Shanerun团队在2020年Voxceleb演讲者识别挑战（VOXSRC）中的提交。我们使用Resnet-34作为编码器来提取说话者嵌入式，这是从开放源voxceleb-Trainer中引用的。我们还提供了一种简单的方法，可以使用T-SNE归一化的测试话语对实现最佳融合，而不是与编码器的原始负欧几里得距离。最终提交的系统对于固定数据轨道的MIDCF和5.076％的ERR获得了0.3098的ERR，这使基线的表现分别优于1.3％的MindCF和2.2％的错误。

In this report, we describe the submission of ShaneRun's team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020. We use ResNet-34 as encoder to extract the speaker embeddings, which is referenced from the open-source voxceleb-trainer. We also provide a simple method to implement optimum fusion using t-SNE normalized distance of testing utterance pairs instead of original negative Euclidean distance from the encoder. The final submitted system got 0.3098 minDCF and 5.076 % ERR for Fixed data track, which outperformed the baseline by 1.3 % minDCF and 2.2 % ERR respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题