论文标题
扬声器验证的成对判别神经PLDA
Pairwise Discriminative Neural PLDA for Speaker Verification
论文作者
论文摘要
说话者验证的最新方法涉及提取歧视性嵌入,例如X媒介,然后使用概率线性判别分析(PLDA)进行生成模型的后端。在本文中,我们为说话者验证的任务提出了一个成对的神经歧视模型,该模型在X-vectors/i-vector等一对扬声器嵌入中运行,并输出一个可以将其视为缩放的对数类似比例比率的分数。我们构建一个可区分的成本函数,该功能近似于说话者验证损失,即最低检测成本。线性判别分析(LDA),单位长度归一化和类协方差归一化的预处理步骤均被建模为神经模型的层,并且在训练过程中可以通过这些层来反向传播扬声器验证成本功能。我们还探索正规化技术以防止过度拟合,这是使用判别后端模型进行验证任务的主要问题。实验是在NIST SRE 2018开发和评估数据集上进行的。我们观察到CMN2条件下的平均相对改善为8%,在PLDA基线系统中,相对改善的平均相对改善。
The state-of-art approach to speaker verification involves the extraction of discriminative embeddings like x-vectors followed by a generative model back-end using a probabilistic linear discriminant analysis (PLDA). In this paper, we propose a Pairwise neural discriminative model for the task of speaker verification which operates on a pair of speaker embeddings such as x-vectors/i-vectors and outputs a score that can be considered as a scaled log-likelihood ratio. We construct a differentiable cost function which approximates speaker verification loss, namely the minimum detection cost. The pre-processing steps of linear discriminant analysis (LDA), unit length normalization and within class covariance normalization are all modeled as layers of a neural model and the speaker verification cost functions can be back-propagated through these layers during training. We also explore regularization techniques to prevent overfitting, which is a major concern in using discriminative back-end models for verification tasks. The experiments are performed on the NIST SRE 2018 development and evaluation datasets. We observe average relative improvements of 8% in CMN2 condition and 30% in VAST condition over the PLDA baseline system.