论文标题
使用跨语言多演讲者TT和跨语性语音转换,在低资源设置中的ASR数据增强
ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
论文作者
论文摘要
我们探索跨语性多演讲者语音综合和跨语性语音转换应用于低/中资源场景中的自动语音识别(ASR)系统的数据增强。通过广泛的实验,我们表明我们的方法允许语音合成和语音转换的应用,以在模型训练期间仅使用一个目标语言扬声器来改善ASR系统。与使用许多讲话者的其他作品相比,我们还设法缩小了经过合成的与人类语音训练的ASR模型之间的差距。最后,我们表明,只使用目标语言的单个真实扬声器,可以通过我们的数据增强方法获得有希望的ASR培训结果。
We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice conversion applied to data augmentation for automatic speech recognition (ASR) systems in low/medium-resource scenarios. Through extensive experiments, we show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model training. We also managed to close the gap between ASR models trained with synthesized versus human speech compared to other works that use many speakers. Finally, we show that it is possible to obtain promising ASR training results with our data augmentation method using only a single real speaker in a target language.