论文标题
关于验证语言模型的模型压缩和加速度的调查
A Survey on Model Compression and Acceleration for Pretrained Language Models
论文作者
论文摘要
尽管在许多NLP任务上实现了最新的性能,但高能量成本和长时间的推理延迟阻止了基于变压器的预审前的语言模型(PLMS)看到更广泛的采用,包括用于边缘和移动计算。有效的NLP研究旨在全面考虑NLP生命周期的计算,时间和碳排放,包括数据准备,模型培训和推理。在这项调查中,我们专注于推理阶段,并回顾了预验证的语言模型的当前模型压缩和加速度的状态,包括基准,指标和方法论。
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long inference delay prevent Transformer-based pretrained language models (PLMs) from seeing broader adoption including for edge and mobile computing. Efficient NLP research aims to comprehensively consider computation, time and carbon emission for the entire life-cycle of NLP, including data preparation, model training and inference. In this survey, we focus on the inference stage and review the current state of model compression and acceleration for pretrained language models, including benchmarks, metrics and methodology.