使用简单的复发单元，用于基于高斯流程的语音综合的话语级顺序建模

论文标题

使用简单的复发单元，用于基于高斯流程的语音综合的话语级顺序建模

Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit

论文作者

Koriyama, Tomoki, Saruwatari, Hiroshi

论文摘要

本文提出了一个深层高斯工艺（DGP）模型，具有用于语音序列建模的复发体系结构。 DGP是一个贝叶斯深模型，可以通过对模型复杂性进行有效训练，并且是一个可以具有高表达性的内核回归模型。在先前的研究中，基于DGP的语音合成优于基于神经网络的语音合成，其中两个模型都使用了馈送式结构。为了改善合成语音的自然性，在本文中，我们表明DGP可以使用经常性体系结构模型应用于话语级建模。我们采用一个简单的复发单元（SRU），以实现拟议的模型，以实现经常性架构，在该架构中，我们可以使用SRU的高平行性性质来执行快速的语音参数生成。客观和主观的评估结果表明，所提出的基于SRU-DGP的语音合成不仅优于馈送前进的DGP，而且可以自动调整基于SRU和长期记忆（LSTM）的神经网络。

This paper presents a deep Gaussian process (DGP) model with a recurrent architecture for speech sequence modeling. DGP is a Bayesian deep model that can be trained effectively with the consideration of model complexity and is a kernel regression model that can have high expressibility. In the previous studies, it was shown that the DGP-based speech synthesis outperformed neural network-based one, in which both models used a feed-forward architecture. To improve the naturalness of synthetic speech, in this paper, we show that DGP can be applied to utterance-level modeling using recurrent architecture models. We adopt a simple recurrent unit (SRU) for the proposed model to achieve a recurrent architecture, in which we can execute fast speech parameter generation by using the high parallelization nature of SRU. The objective and subjective evaluation results show that the proposed SRU-DGP-based speech synthesis outperforms not only feed-forward DGP but also automatically tuned SRU- and long short-term memory (LSTM)-based neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题