论文标题
增强的示例自动编码器具有任何一对一的语音转换中的周期一致性损失
Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion
论文作者
论文摘要
最近的研究表明,可以使用单个演讲者的语音训练的自动编码器,称为exemplar自动编码器(EAE),可用于任何一对一的语音转换(VC)。与大规模多一到许多模型(例如AutoVC)相比,EAE模型在培训中很容易迅速,并且可能会恢复目标扬声器的更多细节。 为了确保风险投资质量,潜在代码应表示并仅表示内容信息。但是,对于EAE而言,这并不容易,因为它不知道模型培训中的任何说话者变化。为了解决问题,我们提出了一种基于周期一致性损失的简单而有效的方法。具体来说,我们用共享的编码器训练多个扬声器的EAE,并鼓励从任何特定于说话者特定的解码器中重建的演讲,以获得一致的潜在代码,因为当循环回来并再次编码时,作为原始演讲。在Aishell-3语料库上进行的实验表明,这种新方法一致地改善了基线EAE。源代码和示例可在项目页面上找到:http://project.cslt.org/。
Recent research showed that an autoencoder trained with speech of a single speaker, called exemplar autoencoder (eAE), can be used for any-to-one voice conversion (VC). Compared to large-scale many-to-many models such as AutoVC, the eAE model is easy and fast in training, and may recover more details of the target speaker. To ensure VC quality, the latent code should represent and only represent content information. However, this is not easy to attain for eAE as it is unaware of any speaker variation in model training. To tackle the problem, we propose a simple yet effective approach based on a cycle consistency loss. Specifically, we train eAEs of multiple speakers with a shared encoder, and meanwhile encourage the speech reconstructed from any speaker-specific decoder to get a consistent latent code as the original speech when cycled back and encoded again. Experiments conducted on the AISHELL-3 corpus showed that this new approach improved the baseline eAE consistently. The source code and examples are available at the project page: http://project.cslt.org/.