Synth2aug：跨域扬声器识别与TTS合成语音

论文标题

Synth2aug：跨域扬声器识别与TTS合成语音

Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech

论文作者

Huang, Yiling, Chen, Yutian, Pelecanos, Jason, Wang, Quan

论文摘要

近年来，文本到语音（TTS）已被用作语音识别的数据增强技术，以帮助补充培训数据中的不足。相应地，我们研究了多演讲者TTS系统的使用来综合语音以支持说话者识别。在这项研究中，我们将分析重点放在了相对少数可用于培训的任务上。我们在数据集上观察到TTS合成的语音可改善跨域扬声器的识别性能，并且可以通过多式训练有效地结合使用。此外，我们探讨了用于TTS合成的不同类型的文本转录本的有效性。结果表明，与目标域的文本内容匹配是一个好的做法，如果这不可行，建议使用足够大的词汇量的成绩单。

In recent years, Text-To-Speech (TTS) has been used as a data augmentation technique for speech recognition to help complement inadequacies in the training data. Correspondingly, we investigate the use of a multi-speaker TTS system to synthesize speech in support of speaker recognition. In this study we focus the analysis on tasks where a relatively small number of speakers is available for training. We observe on our datasets that TTS synthesized speech improves cross-domain speaker recognition performance and can be combined effectively with multi-style training. Additionally, we explore the effectiveness of different types of text transcripts used for TTS synthesis. Results suggest that matching the textual content of the target domain is a good practice, and if that is not feasible, a transcript with a sufficiently large vocabulary is recommended.

下载PDF全文

下载文献需遵守相关版权规定

论文标题