两阶段LLM微调，具有较少的专业化和更多的概括

论文标题

两阶段LLM微调，具有较少的专业化和更多的概括

Two-stage LLM Fine-tuning with Less Specialization and More Generalization

论文作者

Wang, Yihan, Si, Si, Li, Daliang, Lukasik, Michal, Yu, Felix, Hsieh, Cho-Jui, Dhillon, Inderjit S, Kumar, Sanjiv

论文摘要

预处理的大语言模型（LLMS）是通用问题解决者，适用于带有提示的各种任务。通过在专门的数据集上进行微调，可以进一步改进特定任务。但是，微调通常会使模型在此数据集上狭posite，并降低了一般性的内在学习性能，只要微调模型需要处理没有可用的微调数据的其他任务，这种模型就不受欢迎。在这项工作中，我们首先证明，对一项任务进行微调确实会降低LLMS的一般性内部学习绩效。我们发现了这种遗忘，格式专业化的一个重要原因，该模型过度适合微调任务的格式。我们进一步表明格式专业化发生在微调开始时。为了解决这个问题，我们建议使用模型调整（促销）进行及时调整，这是一个简单而有效的两阶段微调框架，可降低格式的专业化并改善概括。Promot卸载任务特异性格式学习到其他可移动和可移动的参数中，首先要迅速进行调整，然后将模型本身通过此软提示固定。通过对几个微调任务和8个中文评估任务进行实验，我们表明，促进对微调任务的可比较性能到标准的微调，但较少的跨域外评估任务范围内的秘密学习表现损失。更重要的是，促销甚至可以增强对与微调任务相关的语义相关的内在学习任务的概括，例如促进末期翻译可显着提高其他语言对的性能，并促进NLI提高摘要的性能。实验还表明，促进可以提高多任务训练的概括性能。

Pretrained large language models (LLMs) are general purpose problem solvers applicable to a diverse set of tasks with prompts. They can be further improved towards a specific task by fine-tuning on a specialized dataset. However, fine-tuning usually makes the model narrowly specialized on this dataset with reduced general in-context learning performances, which is undesirable whenever the fine-tuned model needs to handle additional tasks where no fine-tuning data is available. In this work, we first demonstrate that fine-tuning on a single task indeed decreases LLMs' general in-context learning performance. We discover one important cause of such forgetting, format specialization, where the model overfits to the format of the fine-tuned task.We further show that format specialization happens at the very beginning of fine-tuning. To solve this problem, we propose Prompt Tuning with MOdel Tuning (ProMoT), a simple yet effective two-stage fine-tuning framework that reduces format specialization and improves generalization.ProMoT offloads task-specific format learning into additional and removable parameters by first doing prompt tuning and then fine-tuning the model itself with this soft prompt attached. With experiments on several fine-tuning tasks and 8 in-context evaluation tasks, we show that ProMoT achieves comparable performance on fine-tuned tasks to standard fine-tuning, but with much less loss of in-context learning performances across a board range of out-of-domain evaluation tasks. More importantly, ProMoT can even enhance generalization on in-context learning tasks that are semantically related to the fine-tuned task, e.g. ProMoT on En-Fr translation significantly improves performance on other language pairs, and ProMoT on NLI improves performance on summarization. Experiments also show that ProMoT can improve the generalization performance of multi-task training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题