论文标题
草案:一个新的框架,可减少自我监督学习中的域转移及其对儿童ASR的应用
DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR
论文作者
论文摘要
在低资源自动语音识别(ASR)任务中,使用未注释的语音数据在预处理阶段的自我监督学习(SSL)已成功。但是,通过SSL训练的模型与预处理数据有偏见,该数据通常与填充任务中使用的数据不同,导致域移动问题,从而导致知识传递有限。我们提出了一个新颖的框架,负责的适应和填充(草稿),以通过额外的适应阶段减少验证的语音模型中的域移动。在草稿中,将剩余适配器(RAS)插入验证模型中,以学习与域相关的信息,其SSL损失与训练阶段相同。仅在适应阶段更新RA参数。草稿对所使用的SSL方法的类型不可知,并使用三种广泛使用的方法进行评估:APC,WAV2VEC2.0和Hubert。在两个儿童ASR任务(OGI和MYST数据库)上,使用经过未经注销的成人语音数据(LibrisPeech)训练的SSL模型,与未经适应的预测模型相比,观察到相对改善高达19.7%。其他实验检查了两个数据集之间的交叉知识转移的潜力,结果有希望,显示了对拟议草案框架的更广泛使用。
Self-supervised learning (SSL) in the pretraining stage using un-annotated speech data has been successful in low-resource automatic speech recognition (ASR) tasks. However, models trained through SSL are biased to the pretraining data which is usually different from the data used in finetuning tasks, causing a domain shifting problem, and thus resulting in limited knowledge transfer. We propose a novel framework, domain responsible adaptation and finetuning (DRAFT), to reduce domain shifting in pretrained speech models through an additional adaptation stage. In DRAFT, residual adapters (RAs) are inserted in the pretrained model to learn domain-related information with the same SSL loss as the pretraining stage. Only RA parameters are updated during the adaptation stage. DRAFT is agnostic to the type of SSL method used and is evaluated with three widely used approaches: APC, Wav2vec2.0, and HuBERT. On two child ASR tasks (OGI and MyST databases), using SSL models trained with un-annotated adult speech data (Librispeech), relative WER improvements of up to 19.7% are observed when compared to the pretrained models without adaptation. Additional experiments examined the potential of cross knowledge transfer between the two datasets and the results are promising, showing a broader usage of the proposed DRAFT framework.