论文标题

多对多的语音变压器网络

Many-to-Many Voice Transformer Network

论文作者

Kameoka, Hirokazu, Huang, Wen-Chin, Tanaka, Kou, Kaneko, Takuhiro, Hojo, Nobukatsu, Toda, Tomoki

论文摘要

本文提出了基于序列到序列(S2S)学习框架的语音转换(VC)方法,该方法可以同时转换语音特征,音调轮廓和输入语音的持续时间。我们以前使用称为“语音变压器网络(VTN)”的变压器网络体系结构提出了一种基于S2S的VC方法。原始VTN旨在仅学习从一个扬声器到另一个扬声器的语音特征序列的映射。我们提出的主要思想是对原始VTN的扩展,可以同时学习多个扬声器之间的映射。这种称为多对多VTN的扩展名使其能够通过捕获可以在不同扬声器之间共享的常见潜在功能来完全使用从多个扬声器收集的可用培训数据。它还使我们能够引入称为身份映射损失的训练损失,以确保当源和目标扬声器指数相同时,输入特征序列将保持不变。发现使用这种特殊损失进行模型训练非常有效地在测试时改善模型的性能。我们进行了说话者的身份转换实验,发现我们的模型比基线方法获得了更高的声音质量和扬声器相似性。我们还发现,我们的模型对其架构进行了略有修改,可以很好地处理任何一到许多转换任务。

This paper proposes a voice conversion (VC) method based on a sequence-to-sequence (S2S) learning framework, which enables simultaneous conversion of the voice characteristics, pitch contour, and duration of input speech. We previously proposed an S2S-based VC method using a transformer network architecture called the voice transformer network (VTN). The original VTN was designed to learn only a mapping of speech feature sequences from one speaker to another. The main idea we propose is an extension of the original VTN that can simultaneously learn mappings among multiple speakers. This extension called the many-to-many VTN makes it able to fully use available training data collected from multiple speakers by capturing common latent features that can be shared across different speakers. It also allows us to introduce a training loss called the identity mapping loss to ensure that the input feature sequence will remain unchanged when the source and target speaker indices are the same. Using this particular loss for model training has been found to be extremely effective in improving the performance of the model at test time. We conducted speaker identity conversion experiments and found that our model obtained higher sound quality and speaker similarity than baseline methods. We also found that our model, with a slight modification to its architecture, could handle any-to-many conversion tasks reasonably well.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源