动量对比扬声器表示学习

论文标题

动量对比扬声器表示学习

Momentum Contrast Speaker Representation Learning

论文作者

Lee, Jangho, Koh, Jaihyun, Yoon, Sungroh

论文摘要

无监督的表示学习通过通过有监督的特征学习，尤其是在图像域中降低性能差距，显示出了显着的成就。在这项研究中，为了将无监督学习的技术扩展到语音领域，我们提出了Voxceleb（Mocovox）作为学习机制形式的动量对比。我们通过实现实例歧视，在Voxceleb1上预先培训了Mocovox。应用Mocovox进行扬声器验证表明，它的表现要优于最先进的基于指标的方法。我们还通过分析学习表示的分布来证明语音领域中对比度学习的特征。此外，我们探索了哪些借口任务足以用于说话者验证。我们希望没有人类监督的学习说话者代表有助于解决开放式演讲者的认可。

Unsupervised representation learning has shown remarkable achievement by reducing the performance gap with supervised feature learning, especially in the image domain. In this study, to extend the technique of unsupervised learning to the speech domain, we propose the Momentum Contrast for VoxCeleb (MoCoVox) as a form of learning mechanism. We pre-trained the MoCoVox on the VoxCeleb1 by implementing instance discrimination. Applying MoCoVox for speaker verification revealed that it outperforms the state-of-the-art metric learning-based approach by a large margin. We also empirically demonstrate the features of contrastive learning in the speech domain by analyzing the distribution of learned representations. Furthermore, we explored which pretext task is adequate for speaker verification. We expect that learning speaker representation without human supervision helps to address the open-set speaker recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题