提升器训练和次频建模，用于使用光谱差异的计算高效和高质量的语音转换

论文标题

提升器训练和次频建模，用于使用光谱差异的计算高效和高质量的语音转换

Lifter Training and Sub-band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials

论文作者

Saeki, Takaaki, Saito, Yuki, Takamichi, Shinnosuke, Saruwatari, Hiroshi

论文摘要

在本文中，我们提出了基于光谱差异的直接波形修改的统计语音转换（VC）的计算有效和高质量的方法。具有最小相滤波器的常规方法可实现高质量的转换，但需要进行大量的过滤计算。这是因为使用希尔伯特变换的固定升降机的最小相通常会导致长尾过滤器。我们的方法之一是用于举重训练的数据驱动方法。由于此方法在训练中考虑了过滤器的截断，因此可以在保留转换精度的同时缩短过滤器的水龙头长度。我们的另一种方法是将常规方法从窄带（16 kHz）扩展到全频段（48 kHz）VC的子兰处理，该方法可以转换具有更高转换语音质量的全带波形。实验结果表明，1）窄带VC的提议的LIFTER训练方法可以将水龙头长度缩短至1/16，而不会降低转化后的语音质量； 2）提出的全频段vc的拟议的子频段加工方法比传统方法可以提高转化后的语上语质量。

In this paper, we propose computationally efficient and high-quality methods for statistical voice conversion (VC) with direct waveform modification based on spectral differentials. The conventional method with a minimum-phase filter achieves high-quality conversion but requires heavy computation in filtering. This is because the minimum phase using a fixed lifter of the Hilbert transform often results in a long-tap filter. One of our methods is a data-driven method for lifter training. Since this method takes filter truncation into account in training, it can shorten the tap length of the filter while preserving conversion accuracy. Our other method is sub-band processing for extending the conventional method from narrow-band (16 kHz) to full-band (48 kHz) VC, which can convert a full-band waveform with higher converted-speech quality. Experimental results indicate that 1) the proposed lifter-training method for narrow-band VC can shorten the tap length to 1/16 without degrading the converted-speech quality and 2) the proposed sub-band-processing method for full-band VC can improve the converted-speech quality than the conventional method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题