儿童语音识别的数据增强 - SLT 2021儿童言语识别挑战的“埃塞俄比亚”系统

论文标题

儿童语音识别的数据增强 - SLT 2021儿童言语识别挑战的“埃塞俄比亚”系统

Data Augmentation For Children's Speech Recognition -- The "Ethiopian" System For The SLT 2021 Children Speech Recognition Challenge

论文作者

Chen, Guoguo, Na, Xingyu, Wang, Yongqing, Yan, Zhiyong, Zhang, Junbo, Ma, Sifan, Wang, Yujun

论文摘要

本文介绍了SLT 2021儿童言语识别挑战的“埃塞俄比亚”系统。提出了各种数据处理和增强技术来解决儿童的语音识别问题，尤其是缺乏儿童的语音识别培训数据问题。设计和进行详细的实验，以显示各种语音识别工具包和模型体系结构的每种技术的有效性。一步一步地，我们解释了我们如何提出最终系统，该系统在SLT 2021儿童语音识别挑战中提供了最新结果，其中21.66％的CER在赛道1评估集（总体第4位），在赛道2评估集（总体上排名第一）中，CER和16.53％的CER。促进后分析表明，我们的系统实际上在轨道1评估集中实现了18.82％的CER，但我们将错误的版本提交了赛道1的挑战组织者。

This paper presents the "Ethiopian" system for the SLT 2021 Children Speech Recognition Challenge. Various data processing and augmentation techniques are proposed to tackle children's speech recognition problem, especially the lack of the children's speech recognition training data issue. Detailed experiments are designed and conducted to show the effectiveness of each technique, across different speech recognition toolkits and model architectures. Step by step, we explain how we come up with our final system, which provides the state-of-the-art results in the SLT 2021 Children Speech Recognition Challenge, with 21.66% CER on the Track 1 evaluation set (4th place overall), and 16.53% CER on the Track 2 evaluation set (1st place overall). Post-challenge analysis shows that our system actually achieves 18.82% CER on the Track 1 evaluation set, but we submitted the wrong version to the challenge organizer for Track 1.

下载PDF全文

下载文献需遵守相关版权规定

论文标题