论文标题
受对抗训练的多弹器序列到序列唱歌合成器
Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer
论文作者
论文摘要
本文提出了高质量的唱歌合成器,能够建模具有有限可用录音的声音。基于序列到序列歌手模型,我们设计了一个多手指框架,以利用不同歌手的所有现有歌手数据。为了减轻歌手中音乐得分不平衡问题的问题,我们结合了歌手分类的对抗性任务,以使编码器输出少依赖于歌手。此外,我们在生成的声学特征上应用多个随机窗口判别器(MRWD),以使网络成为GAN。客观和主观评估都表明,所提出的合成器比基线可以产生更高的质量唱歌语音(MOS中的4.12 vs 3.53)。特别是,高元音的表达显着增强。
This paper presents a high quality singing synthesizer that is able to model a voice with limited available recordings. Based on the sequence-to-sequence singing model, we design a multi-singer framework to leverage all the existing singing data of different singers. To attenuate the issue of musical score unbalance among singers, we incorporate an adversarial task of singer classification to make encoder output less singer dependent. Furthermore, we apply multiple random window discriminators (MRWDs) on the generated acoustic features to make the network be a GAN. Both objective and subjective evaluations indicate that the proposed synthesizer can generate higher quality singing voice than baseline (4.12 vs 3.53 in MOS). Especially, the articulation of high-pitched vowels is significantly enhanced.