Interspeech 2020远场演讲者验证挑战

论文标题

Interspeech 2020远场演讲者验证挑战

The INTERSPEECH 2020 Far-Field Speaker Verification Challenge

论文作者

Qin, Xiaoyi, Li, Ming, Bu, Hui, Rao, Wei, Das, Rohan Kumar, Narayanan, Shrikanth, Li, Haizhou

论文摘要

Interspeech 2020远场扬声器验证挑战（FFSVC 2020）解决了定义明确的条件下的三个不同的研究问题：远场与文本依赖的扬声器验证，来自单个麦克风阵列，来自单个麦波恩阵列的远距离文本的扬声器验证，来自单个微观阵列，以及来自分布的奇异粒子的远场式扬声器验证。这三个任务对参与者构成了跨通道挑战。为了模拟现实生活中的情况，从近语手机中记录了注册语音，而测试话语是从远场麦克风阵列中记录的。在本文中，我们描述了数据库，挑战和基线系统，该系统基于具有余弦相似性评分的基于重新连接的深扬声器网络。对于给定的话语，将不同频道的扬声器嵌入与最终嵌入一样。基线系统的MIDCF分别为0.62、0.66和0.64，EERS为6.27％，6.55％和7.18％，分别为任务1，任务2和任务3。

The INTERSPEECH 2020 Far-Field Speaker Verification Challenge (FFSVC 2020) addresses three different research problems under well-defined conditions: far-field text-dependent speaker verification from single microphone array, far-field text-independent speaker verification from single microphone array, and far-field text-dependent speaker verification from distributed microphone arrays. All three tasks pose a cross-channel challenge to the participants. To simulate the real-life scenario, the enrollment utterances are recorded from close-talk cellphone, while the test utterances are recorded from the far-field microphone arrays. In this paper, we describe the database, the challenge, and the baseline system, which is based on a ResNet-based deep speaker network with cosine similarity scoring. For a given utterance, the speaker embeddings of different channels are equally averaged as the final embedding. The baseline system achieves minDCFs of 0.62, 0.66, and 0.64 and EERs of 6.27%, 6.55%, and 7.18% for task 1, task 2, and task 3, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题