高速介绍：旨在调整大型语言模型而无需反向传播

论文标题

高速介绍：旨在调整大型语言模型而无需反向传播

HyperTuning: Toward Adapting Large Language Models without Back-propagation

论文作者

Phang, Jason, Mao, Yi, He, Pengcheng, Chen, Weizhu

论文摘要

针对不同任务的大型语言模型可能是昂贵且效率低下的，甚至减少调谐参数数量的方法仍然需要完全基于梯度的优化。我们提出了Hypertuning，这是一种新颖的模型适应方法，该方法使用超级名模为固定下游模型生成特定于任务的参数。我们展示了使用Hypert5进行高血压的简单设置，这是一种基于T5的超模，该设置从几乎没有示例的示例中产生了柔软的前缀或LORA参数，用于冷冻T5模型。我们在两个阶段进行训练Hypert5：首先，使用修改的条件语言建模目标进行过度预言，该目标训练超模以生成参数；其次，在大量不同的语言任务上进行多任务微调（MTF）。我们在P3，Metaicl和Super-Naturalinstructions数据集上评估Hypert5，并证明它可以有效地生成看不见的任务的参数。此外，我们表明，使用超模生成的参数作为初始化以进行进一步的参数效率微调可改善性能。因此，高血压可以是一种灵活，有效的方法，可以利用大型语言模型用于各种下游应用程序。

Fine-tuning large language models for different tasks can be costly and inefficient, and even methods that reduce the number of tuned parameters still require full gradient-based optimization. We propose HyperTuning, a novel approach to model adaptation that uses a hypermodel to generate task-specific parameters for a fixed downstream model. We demonstrate a simple setup for hypertuning with HyperT5, a T5-based hypermodel that produces soft prefixes or LoRA parameters for a frozen T5 model from few-shot examples. We train HyperT5 in two stages: first, hyperpretraining with a modified conditional language modeling objective that trains a hypermodel to generate parameters; second, multi-task fine-tuning (MTF) on a large number of diverse language tasks. We evaluate HyperT5 on P3, MetaICL and Super-NaturalInstructions datasets, and show that it can effectively generate parameters for unseen tasks. Moreover, we show that using hypermodel-generated parameters as initializations for further parameter-efficient fine-tuning improves performance. HyperTuning can thus be a flexible and efficient way to leverage large language models for diverse downstream applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题