论文标题
TTS-葡萄牙语料库:巴西葡萄牙语中语音综合语料库
TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese
论文作者
论文摘要
语音为人类计算机的互动提供了一种自然的方式。特别是,语音合成系统在不同的应用程序中很受欢迎,例如个人助理,GPS应用程序,屏幕读取器和可访问性工具。但是,在资源和语音综合系统方面,并非所有语言都处于相同的水平。这项工作包括以新颖的数据集的形式为巴西葡萄牙人创建公共可用的资源以及端到端语音综合的深度学习模型。这样的数据集从单个扬声器中有10.5个小时,从该扬声器中,带有RTISI-LA Vocoder的Tacotron 2模型表现出最佳性能,实现了4.03 MOS值。获得的结果与涵盖英语和葡萄牙最新的相关作品相媲美。
Speech provides a natural way for human-computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resources and systems for speech synthesis. This work consists of creating publicly available resources for Brazilian Portuguese in the form of a novel dataset along with deep learning models for end-to-end speech synthesis. Such dataset has 10.5 hours from a single speaker, from which a Tacotron 2 model with the RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. The obtained results are comparable to related works covering English language and the state-of-the-art in Portuguese.