论文标题

您需要的10个小时数据

10 hours data is all you need

论文作者

Min, Zeping, Ge, Qian, Li, Zhong

论文摘要

我们提出了一个新的程序,以生成伪普通话的语音数据,称为CAMP(角色音频混合),该数据旨在从角色量表中产生音频。我们还提出了一种构建普通话字符量表音频数据库自适应camp的方法,该camp camp被称为Meta-Audio,该camp camp逐渐使用音频数据,并可以大大提高数据库的数据多样性。实验表明,我们的营地方法很简单且相当有效。例如,我们在CAMP生成的Aishell-1和伪音频数据中使用10个小时的音频数据训练模型,并达到有竞争力的11.07字符错误率(CER)。此外,我们还只能在CAMP生成的AIDATATANG数据集和伪音频数据中使用10个小时的音频数据进行培训,该数据再次获得了有竞争力的8.26 CER。

We propose a novel procedure to generate pseudo mandarin speech data named as CAMP (character audio mix up), which aims at generating audio from a character scale. We also raise a method for building a mandarin character scale audio database adaptive to CAMP named as META-AUDIO, which makes full use of audio data and can greatly increase the data diversity of the database. Experiments show that our CAMP method is simple and quite effective. For example, we train models with 10 hours of audio data in AISHELL-1 and pseudo audio data generated by CAMP, and achieve a competitive 11.07 character error rate (CER). Besides, we also perform training with only 10 hours of audio data in AIDATATANG dataset and pseudo audio data generated by CAMP, which again achieves a competitive 8.26 CER.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源