论文标题
Scyclone:使用频谱图和循环一致的对抗网络的高质量和并行数据转换
Scyclone: High-Quality and Parallel-Data-Free Voice Conversion Using Spectrogram and Cycle-Consistent Adversarial Networks
论文作者
论文摘要
本文提出了Scyclone,这是一种无需并联数据培训的高质量语音转换(VC)技术。 Scyclone通过使用简化的基于Wavernn的Vocoder引入基于Cyclegan的频谱转换来提高转换语音的语音自然性和说话者的相似性。在Scyclone中,线性频谱图用作转换功能而不是Vocoder参数,这避免了由于基本频率的提取误差和发音/未发音参数而导致的质量退化。源扬声器和目标扬声器的频谱图是通过修改的自行车网络建模的,并使用具有单个高斯概率密度函数的简化Wavernn重建波形。具有完全不合格的培训数据的主观实验表明,Scyclone明显好于Cyclegan-VC2,这是现有的最新无与伦比的无DATA VC技术之一。
This paper proposes Scyclone, a high-quality voice conversion (VC) technique without parallel data training. Scyclone improves speech naturalness and speaker similarity of the converted speech by introducing CycleGAN-based spectrogram conversion with a simplified WaveRNN-based vocoder. In Scyclone, a linear spectrogram is used as the conversion features instead of vocoder parameters, which avoids quality degradation due to extraction errors in fundamental frequency and voiced/unvoiced parameters. The spectrogram of source and target speakers are modeled by modified CycleGAN networks, and the waveform is reconstructed using the simplified WaveRNN with a single Gaussian probability density function. The subjective experiments with completely unpaired training data show that Scyclone is significantly better than CycleGAN-VC2, one of the existing state-of-the-art parallel-data-free VC techniques.