论文标题

使用简单的复发单元,用于基于高斯流程的语音综合的话语级顺序建模

Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit

论文作者

Koriyama, Tomoki, Saruwatari, Hiroshi

论文摘要

本文提出了一个深层高斯工艺(DGP)模型,具有用于语音序列建模的复发体系结构。 DGP是一个贝叶斯深模型,可以通过对模型复杂性进行有效训练,并且是一个可以具有高表达性的内核回归模型。在先前的研究中,基于DGP的语音合成优于基于神经网络的语音合成,其中两个模型都使用了馈送式结构。为了改善合成语音的自然性,在本文中,我们表明DGP可以使用经常性体系结构模型应用于话语级建模。我们采用一个简单的复发单元(SRU),以实现拟议的模型,以实现经常性架构,在该架构中,我们可以使用SRU的高平行性性质来执行快速的语音参数生成。客观和主观的评估结果表明,所提出的基于SRU-DGP的语音合成不仅优于馈送前进的DGP,而且可以自动调整基于SRU和长期记忆(LSTM)的神经网络。

This paper presents a deep Gaussian process (DGP) model with a recurrent architecture for speech sequence modeling. DGP is a Bayesian deep model that can be trained effectively with the consideration of model complexity and is a kernel regression model that can have high expressibility. In the previous studies, it was shown that the DGP-based speech synthesis outperformed neural network-based one, in which both models used a feed-forward architecture. To improve the naturalness of synthetic speech, in this paper, we show that DGP can be applied to utterance-level modeling using recurrent architecture models. We adopt a simple recurrent unit (SRU) for the proposed model to achieve a recurrent architecture, in which we can execute fast speech parameter generation by using the high parallelization nature of SRU. The objective and subjective evaluation results show that the proposed SRU-DGP-based speech synthesis outperforms not only feed-forward DGP but also automatically tuned SRU- and long short-term memory (LSTM)-based neural networks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源