论文标题

使用良好训练的ASR构象异构体模型对低资源目标域模型的域适应

Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models

论文作者

Sukhadia, Vrunda N., Umesh, S.

论文摘要

在本文中,我们研究了目标域数据的低资源自动语音识别(ASR)的域适应性,当时有训练有大型数据集的训练有素的ASR模型。我们认为,在编码器框架中,训练有素的ASR模型的解码器很大程度上是针对源域的,从而损害了Vanilla转移学习中目标域模型的性能。另一方面,训练有素的ASR模型的编码层主要捕获声学特征。因此,我们建议将这些编码层挖掘的嵌入方式用作下游构象异构体目标域模型的特征,并表明它们提供了重大改进。我们进行消融研究,在哪个编码器层可以点击嵌入以及冻结或更新训练良好的ASR模型的编码器层的效果。我们进一步表明,在提议的功能上应用光谱增强(Specaug)(这是对输入光谱特征的默认规格的补充),可以进一步改善目标域性能。对于作为目标域和SPGI-5000作为训练良好的模型的LibrisPeech-100-CLEAN数据,我们比基线获得了30%的相对改进。同样,以WSJ数据为目标域和LibrisPeech-960作为训练有素的模型,我们比基线获得了50%的相对改进。

In this paper, we investigate domain adaptation for low-resource Automatic Speech Recognition (ASR) of target-domain data, when a well-trained ASR model trained with a large dataset is available. We argue that in the encoder-decoder framework, the decoder of the well-trained ASR model is largely tuned towards the source-domain, hurting the performance of target-domain models in vanilla transfer-learning. On the other hand, the encoder layers of the well-trained ASR model mostly capture the acoustic characteristics. We, therefore, propose to use the embeddings tapped from these encoder layers as features for a downstream Conformer target-domain model and show that they provide significant improvements. We do ablation studies on which encoder layer is optimal to tap the embeddings, as well as the effect of freezing or updating the well-trained ASR model's encoder layers. We further show that applying Spectral Augmentation (SpecAug) on the proposed features (this is in addition to default SpecAug on input spectral features) provides a further improvement on the target-domain performance. For the LibriSpeech-100-clean data as target-domain and SPGI-5000 as a well-trained model, we get 30% relative improvement over baseline. Similarly, with WSJ data as target-domain and LibriSpeech-960 as a well-trained model, we get 50% relative improvement over baseline.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源