与学习者对压缩语言模型的有效微调

论文标题

与学习者对压缩语言模型的有效微调

Efficient Fine-Tuning of Compressed Language Models with Learners

论文作者

Vucetic, Danilo, Tayaranian, Mohammadreza, Ziaeefard, Maryam, Clark, James J., Meyer, Brett H., Gross, Warren J.

论文摘要

基于BERT的微调模型在内存，计算和时间上是资源密集的。尽管许多先前的工作旨在通过压缩技术（例如修剪）提高推论效率，但这些作品并未明确解决培训对下游任务的计算挑战。我们介绍了学习者模块和启动，新的方法，以利用预训练的语言模型的过度参数化，以获得收敛速度和资源利用率的好处。学习者模块通过微调参数的微调来导航1）有效训练的双结合，以及2）通过确保快速收敛和高度度量得分有效训练。我们在Distilbert上的结果表明，学习者在与基础方面的表现或超过基线。学习者训练7倍的参数比胶水上的最新方法少。在可乐上，学习者快速调整20％的速度，并且资源利用率显着降低。

Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many prior works aim to improve inference efficiency via compression techniques, e.g., pruning, these works do not explicitly address the computational challenges of training to downstream tasks. We introduce Learner modules and priming, novel methods for fine-tuning that exploit the overparameterization of pre-trained language models to gain benefits in convergence speed and resource utilization. Learner modules navigate the double bind of 1) training efficiently by fine-tuning a subset of parameters, and 2) training effectively by ensuring quick convergence and high metric scores. Our results on DistilBERT demonstrate that learners perform on par with or surpass the baselines. Learners train 7x fewer parameters than state-of-the-art methods on GLUE. On CoLA, learners fine-tune 20% faster, and have significantly lower resource utilization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题