论文标题
部分可观测时空混沌系统的无模型预测
SpeechSplit 2.0: Unsupervised speech disentanglement for voice conversion Without tuning autoencoder Bottlenecks
论文作者
论文摘要
Speakssplit可以通过将语音分解为内容,节奏,音调和音色的语音转换,并以无监督的方式使用多个自动编码器。但是,Speetssplit需要仔细调整自动编码器瓶颈,这可能是耗时且较不健壮的。本文提出了SpeechSplit 2.0,它可以使用有效的信号处理方法而不是瓶颈调整来约束在自动编码器输入上分解语音组件的信息流。评估结果表明,Speetssplit 2.0在语音散文中的语音分解与瓶颈尺寸变化的效果相当。我们的代码可在https://github.com/biggytruck/speechsplit2上找到。
SpeechSplit can perform aspect-specific voice conversion by disentangling speech into content, rhythm, pitch, and timbre using multiple autoencoders in an unsupervised manner. However, SpeechSplit requires careful tuning of the autoencoder bottlenecks, which can be time-consuming and less robust. This paper proposes SpeechSplit 2.0, which constrains the information flow of the speech component to be disentangled on the autoencoder input using efficient signal processing methods instead of bottleneck tuning. Evaluation results show that SpeechSplit 2.0 achieves comparable performance to SpeechSplit in speech disentanglement and superior robustness to the bottleneck size variations. Our code is available at https://github.com/biggytruck/SpeechSplit2.