论文标题
使用autoML的端到端ASR模型的迭代压缩
Iterative Compression of End-to-End ASR Model using AutoML
论文作者
论文摘要
对设备自动语音识别(ASR)系统的需求不断增长,导致对开发自动模型压缩技术的新利益。过去的研究表明,当将基于汽车的低排名分解(LRF)技术应用于端到端编码器注意事项模型ASR模型时,可以实现高达3.7倍的速度超过3.7倍,胜过均优于艰苦的手动人工等级选择方法。但是,我们表明,当前基于AUTOML的搜索技术仅能达到一定的压缩水平,除此之外,它们无法产生具有可接受的单词错误率(WER)的压缩模型。在这项工作中,我们提出了一种基于迭代的LRF方法,该方法可实现超过5倍的压缩,而无需降低WER,从而在ASR压缩中推进了最新的压缩。
Increasing demand for on-device Automatic Speech Recognition (ASR) systems has resulted in renewed interests in developing automatic model compression techniques. Past research have shown that AutoML-based Low Rank Factorization (LRF) technique, when applied to an end-to-end Encoder-Attention-Decoder style ASR model, can achieve a speedup of up to 3.7x, outperforming laborious manual rank-selection approaches. However, we show that current AutoML-based search techniques only work up to a certain compression level, beyond which they fail to produce compressed models with acceptable word error rates (WER). In this work, we propose an iterative AutoML-based LRF approach that achieves over 5x compression without degrading the WER, thereby advancing the state-of-the-art in ASR compression.