论文标题

预测语言内部和跨语音识别自我监督学习语音预训练模型的表现

Predicting within and across language phoneme recognition performance of self-supervised learning speech pre-trained models

论文作者

Ji, Hang, Patel, Tanvina, Scharenborg, Odette

论文摘要

在这项工作中,我们分析并比较了从不同的冷冻自学学习(SSL)语音训练的模型中提取的语音表示,以捕获关节功能(AF)信息的能力以及随后对语言场景内和语言场景的电话识别性能的预测。具体而言,我们比较了CPC,WAV2VEC 2.0和Hubert。首先,实施了帧级AF探测任务。随后,实现了用于音素识别任务的电话级端到端ASR系统,并且在框架级AF探测任务和电话精度上的性能相关联。与传统的语音表示MFCC相比,所有SSL预先训练的语音表示均捕获了更多的AF信息,并在语言中和语言中实现了更好的音素识别性能,而Hubert表现最好。帧级AF探测任务是音素识别性能的良好预测指标,表明在语音表示中捕获AF信息的重要性。与MFCC相比,在语言内场景中,这些SSL语音预先训练的模型在AF探测任务上的性能达到了34.4%的最大相对增加,并且导致每10.2%的最低相对增加。在跨语言的情况下,最大相对增长26.7%也导致每23.0%的最低点。

In this work, we analyzed and compared speech representations extracted from different frozen self-supervised learning (SSL) speech pre-trained models on their ability to capture articulatory features (AF) information and their subsequent prediction of phone recognition performance for within and across language scenarios. Specifically, we compared CPC, wav2vec 2.0, and HuBert. First, frame-level AF probing tasks were implemented. Subsequently, phone-level end-to-end ASR systems for phoneme recognition tasks were implemented, and the performance on the frame-level AF probing task and the phone accuracy were correlated. Compared to the conventional speech representation MFCC, all SSL pre-trained speech representations captured more AF information, and achieved better phoneme recognition performance within and across languages, with HuBert performing best. The frame-level AF probing task is a good predictor of phoneme recognition performance, showing the importance of capturing AF information in the speech representations. Compared with MFCC, in the within-language scenario, the performance of these SSL speech pre-trained models on AF probing tasks achieved a maximum relative increase of 34.4%, and it resulted in the lowest PER of 10.2%. In the cross-language scenario, the maximum relative increase of 26.7% also resulted in the lowest PER of 23.0%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源