使用跨语言多演讲者TT和跨语性语音转换，在低资源设置中的ASR数据增强

论文标题

使用跨语言多演讲者TT和跨语性语音转换，在低资源设置中的ASR数据增强

ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion

论文作者

Casanova, Edresson, Shulby, Christopher, Korolev, Alexander, Junior, Arnaldo Candido, Soares, Anderson da Silva, Aluísio, Sandra, Ponti, Moacir Antonelli

论文摘要

我们探索跨语性多演讲者语音综合和跨语性语音转换应用于低/中资源场景中的自动语音识别（ASR）系统的数据增强。通过广泛的实验，我们表明我们的方法允许语音合成和语音转换的应用，以在模型训练期间仅使用一个目标语言扬声器来改善ASR系统。与使用许多讲话者的其他作品相比，我们还设法缩小了经过合成的与人类语音训练的ASR模型之间的差距。最后，我们表明，只使用目标语言的单个真实扬声器，可以通过我们的数据增强方法获得有希望的ASR培训结果。

We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice conversion applied to data augmentation for automatic speech recognition (ASR) systems in low/medium-resource scenarios. Through extensive experiments, we show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model training. We also managed to close the gap between ASR models trained with synthesized versus human speech compared to other works that use many speakers. Finally, we show that it is possible to obtain promising ASR training results with our data augmentation method using only a single real speaker in a target language.

下载PDF全文

下载文献需遵守相关版权规定

论文标题