论文标题
低资源E2E ASR的基于策略的规格方法的方法
A Policy-based Approach to the SpecAugment Method for Low Resource E2E ASR
论文作者
论文摘要
对于基于HMM和E2E的自动语音识别(ASR)系统,规格是一种非常有效的数据增强方法。尤其是,它还在低资源场景中起作用。但是,规格掩盖了固定的增强策略中的时间范围或频域,这可能会给低资源ASR带来相对较小的数据多样性。在本文中,我们提出了一种基于政策的规格(策略规格)方法来减轻上述问题。这个想法是使用增强选择策略和增强参数改变政策来解决固定方式。这些政策是根据失去验证集的损失来学习的,验证集可应用于相应的增强策略。它旨在鼓励模型学习更多不同的数据,该数据相对需要。在实验中,我们评估了在低资源场景(即100小时的Librispeech任务)中的方法的有效性。根据结果和分析,我们可以看到,可以使用我们的提案来缓解上述问题。此外,实验结果表明,与最先进的规格相比,所提出的策略规格在测试/开发/脱机组中的相对减少超过10%,在测试/开发组中超过5%,并且在所有测试集中绝对降低了1%以上的绝对降低。
SpecAugment is a very effective data augmentation method for both HMM and E2E-based automatic speech recognition (ASR) systems. Especially, it also works in low-resource scenarios. However, SpecAugment masks the spectrum of time or the frequency domain in a fixed augmentation policy, which may bring relatively less data diversity to the low-resource ASR. In this paper, we propose a policy-based SpecAugment (Policy-SpecAugment) method to alleviate the above problem. The idea is to use the augmentation-select policy and the augmentation-parameter changing policy to solve the fixed way. These policies are learned based on the loss of validation set, which is applied to the corresponding augmentation policies. It aims to encourage the model to learn more diverse data, which the model relatively requires. In experiments, we evaluate the effectiveness of our approach in low-resource scenarios, i.e., the 100 hours librispeech task. According to the results and analysis, we can see that the above issue can be obviously alleviated using our proposal. In addition, the experimental results show that, compared with the state-of-the-art SpecAugment, the proposed Policy-SpecAugment has a relative WER reduction of more than 10% on the Test/Dev-clean set, more than 5% on the Test/Dev-other set, and an absolute WER reduction of more than 1% on all test sets.