基于Wavenet的神经声码器的在线扬声器改编

论文标题

基于Wavenet的神经声码器的在线扬声器改编

Online Speaker Adaptation for WaveNet-based Neural Vocoders

论文作者

Huang, Qiuchen, Ai, Yang, Ling, Zhenhua

论文摘要

在本文中，我们为基于Wavenet的神经声码编码器提出了一种在线扬声器适应方法，以提高其在与说话者无关的波形生成上的性能。在这种方法中，首先使用大型扬声器验证数据集构造了扬声器编码器，该数据集可以从任意扬声器发音的发言中提取说话者嵌入向量的扬声器。在训练阶段，然后使用多演讲扬声器数据集构建一个扬声器感知的WaveNet Vocoder，该数据集既采用声学特征序列，又采用扬声器将向量嵌入为条件。然后，嵌入矢量和声学特征的扬声器都将扬声器吸引的WaveNet Vocoder通过以重建语音波形。实验结果表明，与传统的无独立的WaveNet Vovenet Vocoder相比，我们的方法可以在重建看不见的说话者的波形方面获得更好的客观和主观性能。

In this paper, we propose an online speaker adaptation method for WaveNet-based neural vocoders in order to improve their performance on speaker-independent waveform generation. In this method, a speaker encoder is first constructed using a large speaker-verification dataset which can extract a speaker embedding vector from an utterance pronounced by an arbitrary speaker. At the training stage, a speaker-aware WaveNet vocoder is then built using a multi-speaker dataset which adopts both acoustic feature sequences and speaker embedding vectors as conditions.At the generation stage, we first feed the acoustic feature sequence from a test speaker into the speaker encoder to obtain the speaker embedding vector of the utterance. Then, both the speaker embedding vector and acoustic features pass the speaker-aware WaveNet vocoder to reconstruct speech waveforms. Experimental results demonstrate that our method can achieve a better objective and subjective performance on reconstructing waveforms of unseen speakers than the conventional speaker-independent WaveNet vocoder.

下载PDF全文

下载文献需遵守相关版权规定

论文标题