Specgrad：基于自适应噪声光谱形状的基于概率模型的扩散概率模型

论文标题

Specgrad：基于自适应噪声光谱形状的基于概率模型的扩散概率模型

SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping

论文作者

Koizumi, Yuma, Zen, Heiga, Yatabe, Kohei, Chen, Nanxin, Bacchiani, Michiel

论文摘要

使用Denoising扩散概率模型（DDPM）的神经声码器已通过适应给定的声学特征的扩散噪声分布来改善。在这项研究中，我们提出了适应扩散噪声的斑点，以使其随时间变化的光谱包络变得接近调节日志光谱图。随着时间变化的过滤这种适应可改善声音质量，尤其是在高频带中。它是在时频域中处理的，以使计算成本几乎与常规DDPM的神经声码器相同。实验结果表明，在分析合成和语音增强方案中，Specgrad比常规DDPM的神经声码器产生比常规DDPM的更高的语音波形。音频演示可在wavegrad.github.io/specgrad/上找到。

Neural vocoder using denoising diffusion probabilistic model (DDPM) has been improved by adaptation of the diffusion noise distribution to given acoustic features. In this study, we propose SpecGrad that adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram. This adaptation by time-varying filtering improves the sound quality especially in the high-frequency bands. It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders. Experimental results showed that SpecGrad generates higher-fidelity speech waveform than conventional DDPM-based neural vocoders in both analysis-synthesis and speech enhancement scenarios. Audio demos are available at wavegrad.github.io/specgrad/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题