论文标题
我们距离强大的语音转换有多远:调查
How Far Are We from Robust Voice Conversion: A Survey
论文作者
论文摘要
近年来,在深度学习的帮助下,语音转换技术得到了极大的改进,但是它们在不同条件下产生自然发声的话语的能力尚不清楚。在本文中,我们对已知VC模型的鲁棒性进行了详尽的研究。我们还修改了这些模型,例如更换扬声器嵌入,以进一步改善其性能。我们发现采样率和音频持续时间极大地影响了语音转换。所有的VC模型都遭受了看不见的数据,但是ADAIN-VC相对更强大。同样,与扬声器识别训练的扬声器相比,嵌入共同训练的培训的扬声器更适合语音转换。
Voice conversion technologies have been greatly improved in recent years with the help of deep learning, but their capabilities of producing natural sounding utterances in different conditions remain unclear. In this paper, we gave a thorough study of the robustness of known VC models. We also modified these models, such as the replacement of speaker embeddings, to further improve their performances. We found that the sampling rate and audio duration greatly influence voice conversion. All the VC models suffer from unseen data, but AdaIN-VC is relatively more robust. Also, the speaker embedding jointly trained is more suitable for voice conversion than those trained on speaker identification.