论文标题
用于改进参数语音综合的本征疗法
Eigenresiduals for improved Parametric Speech Synthesis
论文作者
论文摘要
统计参数语音合成器最近显示了它们产生自然声音和灵活声音的能力。不幸的是,由于语音是演讲的事实,交付的质量遭受了典型的嗡嗡声。本文提出了一种新的激发模型,以减少这种不良效果。该模型基于通过主成分分析获得的正顺序基础的俯仰同步残留帧的分解。此基础包含有限数量的本征性,并且在相对较小的语音数据库上计算。将基于PCA的系数流添加到我们的基于HMM的合成器中,并允许在合成过程中产生声音激发。与传统兴奋相比,据报道,合成引擎足迹保持在约1MB之下,这是一种改善。
Statistical parametric speech synthesizers have recently shown their ability to produce natural-sounding and flexible voices. Unfortunately the delivered quality suffers from a typical buzziness due to the fact that speech is vocoded. This paper proposes a new excitation model in order to reduce this undesirable effect. This model is based on the decomposition of pitch-synchronous residual frames on an orthonormal basis obtained by Principal Component Analysis. This basis contains a limited number of eigenresiduals and is computed on a relatively small speech database. A stream of PCA-based coefficients is added to our HMM-based synthesizer and allows to generate the voiced excitation during the synthesis. An improvement compared to the traditional excitation is reported while the synthesis engine footprint remains under about 1Mb.