论文标题
DGC-VECTOR:一个新的扬声器嵌入零发音转换的新扬声器
DGC-vector: A new speaker embedding for zero-shot voice conversion
论文作者
论文摘要
最近,已经提出了越来越多的零声音转换算法。作为零发音转换的基本组成部分,扬声器嵌入是改善转换后的演讲者相似性的关键。在本文中,我们研究了扬声器嵌入对零摄像语音转换性能的影响。为了更好地代表目标说话者的特征,并提高了零声音转换中的说话者相似性,我们在本文中提出了一种新颖的说话者表示方法。我们的方法结合了D-Vector,基于全球样式令牌(GST)的说话者表示和辅助监督的优势。客观和主观评估表明,所提出的方法在零声音转换上取得了不错的性能,并显着提高了与D-vector和基于GST的扬声器嵌入的说话者相似性。
Recently, more and more zero-shot voice conversion algorithms have been proposed. As a fundamental part of zero-shot voice conversion, speaker embeddings are the key to improving the converted speech's speaker similarity. In this paper, we study the impact of speaker embeddings on zero-shot voice conversion performance. To better represent the characteristics of the target speaker and improve the speaker similarity in zero-shot voice conversion, we propose a novel speaker representation method in this paper. Our method combines the advantages of D-vector, global style token (GST) based speaker representation and auxiliary supervision. Objective and subjective evaluations show that the proposed method achieves a decent performance on zero-shot voice conversion and significantly improves speaker similarity over D-vector and GST-based speaker embedding.