论文标题
字母:大规模预训练的语言模型的量化参数有效改编
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
论文作者
论文摘要
使用参数有效的微调方法调整大规模语言模型的兴趣日益增长。但是,尚未详细探讨通过模型压缩来加速模型本身并通过模型压缩提高推理效率。模型压缩可以提供减少记忆足迹,实现低精度计算并最终实现具有成本效益的推断的好处。为了结合参数效率调整和模型压缩,我们提出的字母启动是由训练训练的语言模型的训练后量化组成的,并且仅对目标任务进行微调量化参数的某些部分。具体而言,字母函数通过采用二进制编码量化来起作用,该量化量化量化量化的量化量化量和单独的缩放因子集将完整过度参数分解为二进制参数。在适应阶段,所有任务都将二进制值冻结,而对下游任务的缩放因子则进行微调。我们证明,当应用于GPT-2并进行OPT的字母函数在各种下游任务上进行全面调查,同时在4位量化下达到> 10倍的压缩率,可训练参数的数量减少> 1,000x。
There are growing interests in adapting large-scale language models using parameter-efficient fine-tuning methods. However, accelerating the model itself and achieving better inference efficiency through model compression has not been thoroughly explored yet. Model compression could provide the benefits of reducing memory footprints, enabling low-precision computations, and ultimately achieving cost-effective inference. To combine parameter-efficient adaptation and model compression, we propose AlphaTuning consisting of post-training quantization of the pre-trained language model and fine-tuning only some parts of quantized parameters for a target task. Specifically, AlphaTuning works by employing binary-coding quantization, which factorizes the full-precision parameters into binary parameters and a separate set of scaling factors. During the adaptation phase, the binary values are frozen for all tasks, while the scaling factors are fine-tuned for the downstream task. We demonstrate that AlphaTuning, when applied to GPT-2 and OPT, performs competitively with full fine-tuning on a variety of downstream tasks while achieving >10x compression ratio under 4-bit quantization and >1,000x reduction in the number of trainable parameters.