论文标题

Synth2aug:跨域扬声器识别与TTS合成语音

Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech

论文作者

Huang, Yiling, Chen, Yutian, Pelecanos, Jason, Wang, Quan

论文摘要

近年来,文本到语音(TTS)已被用作语音识别的数据增强技术,以帮助补充培训数据中的不足。相应地,我们研究了多演讲者TTS系统的使用来综合语音以支持说话者识别。在这项研究中,我们将分析重点放在了相对少数可用于培训的任务上。我们在数据集上观察到TTS合成的语音可改善跨域扬声器的识别性能,并且可以通过多式训练有效地结合使用。此外,我们探讨了用于TTS合成的不同类型的文本转录本的有效性。结果表明,与目标域的文本内容匹配是一个好的做法,如果这不可行,建议使用足够大的词汇量的成绩单。

In recent years, Text-To-Speech (TTS) has been used as a data augmentation technique for speech recognition to help complement inadequacies in the training data. Correspondingly, we investigate the use of a multi-speaker TTS system to synthesize speech in support of speaker recognition. In this study we focus the analysis on tasks where a relatively small number of speakers is available for training. We observe on our datasets that TTS synthesized speech improves cross-domain speaker recognition performance and can be combined effectively with multi-style training. Additionally, we explore the effectiveness of different types of text transcripts used for TTS synthesis. Results suggest that matching the textual content of the target domain is a good practice, and if that is not feasible, a transcript with a sufficiently large vocabulary is recommended.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源