DGC-VECTOR：一个新的扬声器嵌入零发音转换的新扬声器

论文标题

DGC-VECTOR：一个新的扬声器嵌入零发音转换的新扬声器

DGC-vector: A new speaker embedding for zero-shot voice conversion

论文作者

Xiao, Ruitong, Zhang, Haitong, Lin, Yue

论文摘要

最近，已经提出了越来越多的零声音转换算法。作为零发音转换的基本组成部分，扬声器嵌入是改善转换后的演讲者相似性的关键。在本文中，我们研究了扬声器嵌入对零摄像语音转换性能的影响。为了更好地代表目标说话者的特征，并提高了零声音转换中的说话者相似性，我们在本文中提出了一种新颖的说话者表示方法。我们的方法结合了D-Vector，基于全球样式令牌（GST）的说话者表示和辅助监督的优势。客观和主观评估表明，所提出的方法在零声音转换上取得了不错的性能，并显着提高了与D-vector和基于GST的扬声器嵌入的说话者相似性。

Recently, more and more zero-shot voice conversion algorithms have been proposed. As a fundamental part of zero-shot voice conversion, speaker embeddings are the key to improving the converted speech's speaker similarity. In this paper, we study the impact of speaker embeddings on zero-shot voice conversion performance. To better represent the characteristics of the target speaker and improve the speaker similarity in zero-shot voice conversion, we propose a novel speaker representation method in this paper. Our method combines the advantages of D-vector, global style token (GST) based speaker representation and auxiliary supervision. Objective and subjective evaluations show that the proposed method achieves a decent performance on zero-shot voice conversion and significantly improves speaker similarity over D-vector and GST-based speaker embedding.

下载PDF全文

下载文献需遵守相关版权规定

论文标题