论文标题

低数据?没问题:低资源,语言不可能的对话文本到语音通过F0条件数据增强

Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation

论文作者

Comini, Giulia, Huybrechts, Goeric, Ribeiro, Manuel Sam, Gabrys, Adam, Lorenzo-Trueba, Jaime

论文摘要

跨语言的表达方式中的数据可用性有限,并且记录会话昂贵且耗时。为了克服这些问题,我们演示了如何在没有1小时的对话语音的情况下建立低资源,神经文本到语音(TTS)声音,而没有其他对话数据以相同的语言可用。假设使用该语言的非表达语音数据的可用性,我们提出了三步技术:1)我们将F0条件的语音转换(VC)模型作为数据增强技术培训; 2)我们训练F0预测器,以控制语音转换的合成数据的对话风格; 3)我们训练一个消耗增强数据的TTS系统。我们证明,我们的技术可以实现F0可控性,可以在扬声器和语言之间进行扩展,并且在最先进的基线模型的自然性方面具有竞争力,这是另一种不利用F0信息的增强方法。

The availability of data in expressive styles across languages is limited, and recording sessions are costly and time consuming. To overcome these issues, we demonstrate how to build low-resource, neural text-to-speech (TTS) voices with only 1 hour of conversational speech, when no other conversational data are available in the same language. Assuming the availability of non-expressive speech data in that language, we propose a 3-step technology: 1) we train an F0-conditioned voice conversion (VC) model as data augmentation technique; 2) we train an F0 predictor to control the conversational flavour of the voice-converted synthetic data; 3) we train a TTS system that consumes the augmented data. We prove that our technology enables F0 controllability, is scalable across speakers and languages and is competitive in terms of naturalness over a state-of-the-art baseline model, another augmented method which does not make use of F0 information.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源