在基于TACOTRON的TTS中使用多任务学习对韵律措辞进行建模

论文标题

在基于TACOTRON的TTS中使用多任务学习对韵律措辞进行建模

Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS

论文作者

Liu, Rui, Sisman, Berrak, Bao, Feilong, Gao, Guanglai, Li, Haizhou

论文摘要

基于TACOTRON的端到端语音综合表现出了出色的语音质量。但是，综合语音中韵律的渲染尚待改善，尤其是对于长期句子，在韵律措辞上可能经常发生。在本文中，我们扩展了基于TACOTRON的语音合成框架，以明确对韵律短语断裂进行建模。我们提出了一种用于TACOTRON训练的多任务学习方案，该方案优化了系统以预测MEL Spectrum和短语中断的系统。据我们所知，这是具有韵律措辞模型的基于TACOTRON的TTS的多任务学习的首次实现。实验表明，我们提出的培训计划始终提高中国和蒙古系统的语音质量。

Tacotron-based end-to-end speech synthesis has shown remarkable voice quality. However, the rendering of prosody in the synthesized speech remains to be improved, especially for long sentences, where prosodic phrasing errors can occur frequently. In this paper, we extend the Tacotron-based speech synthesis framework to explicitly model the prosodic phrase breaks. We propose a multi-task learning scheme for Tacotron training, that optimizes the system to predict both Mel spectrum and phrase breaks. To our best knowledge, this is the first implementation of multi-task learning for Tacotron based TTS with a prosodic phrasing model. Experiments show that our proposed training scheme consistently improves the voice quality for both Chinese and Mongolian systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题