Wabft：具有标记和未标记数据的声学模型列出

论文标题

Wabft：具有标记和未标记数据的声学模型列出

WavFT: Acoustic model finetuning with labelled and unlabelled data

论文作者

Chauhan, Utkarsh, Joshi, Vikas, Mehta, Rupesh R.

论文摘要

无监督和自我监督的学习方法利用了未经标记的数据来改善预验证的模型。但是，这些方法需要大量的未标记数据，并且具有如此大量数据的培训模型的计算成本可以过高。我们通过在填充过程中使用未标记的数据而不是预处理来解决此问题。我们使用标记和未标记的数据提出了声学模型芬特（FT）。该模型经过共同培训，以学习对Senones进行分类的表征，并学习上下文表征。我们的训练目标是跨熵损失，适合分类任务和对比度损失的组合，适合学习声学表示。所提出的方法的表现优于传统的登录，分别对古吉拉特语和孟加拉语语言的单词错误率相关（WERR）降低了11.2％和9.19％。

Unsupervised and self-supervised learning methods have leveraged unlabelled data to improve the pretrained models. However, these methods need significantly large amount of unlabelled data and the computational cost of training models with such large amount of data can be prohibitively high. We address this issue by using unlabelled data during finetuning, instead of pretraining. We propose acoustic model finetuning (FT) using labelled and unlabelled data. The model is jointly trained to learn representations to classify senones, as well as learn contextual acoustic representations. Our training objective is a combination of cross entropy loss, suitable for classification task, and contrastive loss, suitable to learn acoustic representations. The proposed approach outperforms conventional finetuning with 11.2% and 9.19% word error rate relative (WERR) reduction on Gujarati and Bengali languages respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题