论文标题
CINC-GAN用于有效的F0预测,用于耳语到正常的语音转换
CinC-GAN for Effective F0 prediction for Whisper-to-Normal Speech Conversion
论文作者
论文摘要
最近,基于生成的对抗网络(GAN)的方法在语音转换和耳语到正常的语音(WHSP2SPCH)转换方面表现出色。 WHSP2SPCH转换中的主要挑战之一是基本频率(F0)的预测。最近,作者提出了用于WHSP2SPCH转换的最新方法周期循环偶然的生成对抗网络(CycleGAN)。基于自行车的方法使用两个不同的模型,一种用于MEL CEPSTRAL系数(MCC)映射,另一种用于F0预测,其中F0高度依赖于MCC映射的预训练模型。这会导致预测的F0中的其他非线性噪声。为了抑制这种噪音,我们提出了循环中的gan循环(即CINC-GAN)。它的设计专门为提高F0预测的有效性而不会失去MCC映射的准确性。我们在非平行设置上评估了所提出的方法,并在说话者特定和特定于性别的任务上进行了分析。客观和主观测试表明,CINC-GAN的表现明显优于Cyclean。此外,我们分析了看不见的扬声器的自行车和CINC-GAN,结果表明了Cinc-Gan的明显优势。
Recently, Generative Adversarial Networks (GAN)-based methods have shown remarkable performance for the Voice Conversion and WHiSPer-to-normal SPeeCH (WHSP2SPCH) conversion. One of the key challenges in WHSP2SPCH conversion is the prediction of fundamental frequency (F0). Recently, authors have proposed state-of-the-art method Cycle-Consistent Generative Adversarial Networks (CycleGAN) for WHSP2SPCH conversion. The CycleGAN-based method uses two different models, one for Mel Cepstral Coefficients (MCC) mapping, and another for F0 prediction, where F0 is highly dependent on the pre-trained model of MCC mapping. This leads to additional non-linear noise in predicted F0. To suppress this noise, we propose Cycle-in-Cycle GAN (i.e., CinC-GAN). It is specially designed to increase the effectiveness in F0 prediction without losing the accuracy of MCC mapping. We evaluated the proposed method on a non-parallel setting and analyzed on speaker-specific, and gender-specific tasks. The objective and subjective tests show that CinC-GAN significantly outperforms the CycleGAN. In addition, we analyze the CycleGAN and CinC-GAN for unseen speakers and the results show the clear superiority of CinC-GAN.