Adapitch：改编多扬声器文本到语音语音，条件是通过未转录数据的音调解开

论文标题

Adapitch：改编多扬声器文本到语音语音，条件是通过未转录数据的音调解开

Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data

论文作者

Zhang, Xulong, Wang, Jianzong, Cheng, Ning, Xiao, Jing

论文摘要

在本文中，我们提出了Adapitch，这是一种多演讲者TTS方法，它通过未转录的数据对监督模块进行适应。我们设计了两个自我监督模块，以分别使用未转录的数据分别训练文本编码器和MEL解码器，以增强文本和MEL的表示。为了更好地处理综合语音中的韵律信息，监督的TTS模块的设计基于音调，文本和扬声器的内容。将训练阶段分为两个部分，仔细考虑并以无监督模式固定文本编码器和MEL解码器，然后在TTS解开时进行监督模式。实验结果表明，Adaptich的质量比基线方法要好得多。

In this paper, we proposed Adapitch, a multi-speaker TTS method that makes adaptation of the supervised module with untranscribed data. We design two self supervised modules to train the text encoder and mel decoder separately with untranscribed data to enhance the representation of text and mel. To better handle the prosody information in a synthesized voice, a supervised TTS module is designed conditioned on content disentangling of pitch, text, and speaker. The training phase was separated into two parts, pretrained and fixed the text encoder and mel decoder with unsupervised mode, then the supervised mode on the disentanglement of TTS. Experiment results show that the Adaptich achieved much better quality than baseline methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题