论文标题

知识转移和蒸馏从自回旋到非自动回归语音识别

Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition

论文作者

Gong, Xun, Zhou, Zhikai, Qian, Yanmin

论文摘要

现代非自动性〜(NAR)语音识别系统旨在加速推理速度;但是,与自动回归〜(AR)型号以及巨大的模型大小问题相比,它们会遭受性能降解。我们提出了一种新颖的知识转移和蒸馏体系结构,该结构利用AR模型的知识来提高NAR性能,同时降低模型的大小。框架和序列级别的目标已精心设计用于转移学习。为了进一步提高NAR的性能,开发了蒙版-CTC上的光束搜索方法,以扩大推理阶段的搜索空间。实验表明,所提出的NAR束搜索相对可在Aishell-1基准测试中降低5%以上,并具有可耐受的实时因子〜(RTF)增量。通过知识转移,与AR教师相同的NAR学生在Aishell-1 Dev/Test集中获得了8/16%的相对CER降低,并且对Librispeech测试清洁/其他集合的相对减少了25%。此外,通过提出的知识转移和蒸馏,在Aishell-1和LibrisPeech基准上,〜9x较小的NAR模型在Aishell-1和LibrisPeech基准上都实现了约25%的相对CER/WER。

Modern non-autoregressive~(NAR) speech recognition systems aim to accelerate the inference speed; however, they suffer from performance degradation compared with autoregressive~(AR) models as well as the huge model size issue. We propose a novel knowledge transfer and distillation architecture that leverages knowledge from AR models to improve the NAR performance while reducing the model's size. Frame- and sequence-level objectives are well-designed for transfer learning. To further boost the performance of NAR, a beam search method on Mask-CTC is developed to enlarge the search space during the inference stage. Experiments show that the proposed NAR beam search relatively reduces CER by over 5% on AISHELL-1 benchmark with a tolerable real-time-factor~(RTF) increment. By knowledge transfer, the NAR student who has the same size as the AR teacher obtains relative CER reductions of 8/16% on AISHELL-1 dev/test sets, and over 25% relative WER reductions on LibriSpeech test-clean/other sets. Moreover, the ~9x smaller NAR models achieve ~25% relative CER/WER reductions on both AISHELL-1 and LibriSpeech benchmarks with the proposed knowledge transfer and distillation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源