使用autoML的端到端ASR模型的迭代压缩

论文标题

使用autoML的端到端ASR模型的迭代压缩

Iterative Compression of End-to-End ASR Model using AutoML

论文作者

Mehrotra, Abhinav, Dudziak, Łukasz, Yeo, Jinsu, Lee, Young-yoon, Vipperla, Ravichander, Abdelfattah, Mohamed S., Bhattacharya, Sourav, Ishtiaq, Samin, Ramos, Alberto Gil C. P., Lee, SangJeong, Kim, Daehyun, Lane, Nicholas D.

论文摘要

对设备自动语音识别（ASR）系统的需求不断增长，导致对开发自动模型压缩技术的新利益。过去的研究表明，当将基于汽车的低排名分解（LRF）技术应用于端到端编码器注意事项模型ASR模型时，可以实现高达3.7倍的速度超过3.7倍，胜过均优于艰苦的手动人工等级选择方法。但是，我们表明，当前基于AUTOML的搜索技术仅能达到一定的压缩水平，除此之外，它们无法产生具有可接受的单词错误率（WER）的压缩模型。在这项工作中，我们提出了一种基于迭代的LRF方法，该方法可实现超过5倍的压缩，而无需降低WER，从而在ASR压缩中推进了最新的压缩。

Increasing demand for on-device Automatic Speech Recognition (ASR) systems has resulted in renewed interests in developing automatic model compression techniques. Past research have shown that AutoML-based Low Rank Factorization (LRF) technique, when applied to an end-to-end Encoder-Attention-Decoder style ASR model, can achieve a speedup of up to 3.7x, outperforming laborious manual rank-selection approaches. However, we show that current AutoML-based search techniques only work up to a certain compression level, beyond which they fail to produce compressed models with acceptable word error rates (WER). In this work, we propose an iterative AutoML-based LRF approach that achieves over 5x compression without degrading the WER, thereby advancing the state-of-the-art in ASR compression.

下载PDF全文

下载文献需遵守相关版权规定

论文标题