使预训练的语言模型更好

论文标题

使预训练的语言模型更好

Making Pre-trained Language Models Better Few-shot Learners

论文作者

Gao, Tianyu, Fisch, Adam, Chen, Danqi

论文摘要

最近的GPT-3模型（Brown等，2020）仅通过利用自然语言提示和一些任务演示作为输入环境而实现了显着的几次表现。受其发现的启发，我们在更实用的情况下学习了很少的学习学习，在这种情况下，我们使用较小的语言模型，用于计算效率。我们提出了LM-BFF-语言模型的几次微调 - 一套简单和互补的技术，用于少数带注释的示例的微调语言模型。我们的方法包括（1）基于及时的微调以及一条新颖的管道，以自动化及时生成；（2）一种动态和选择性地将演示纳入每个上下文的精致策略。最后，我们提出了一个系统评估，用于分析一系列NLP任务（包括分类和回归）上的少量性能。我们的实验表明，在这种低资源设置中，我们的方法结合起来大大超过标准的微调程序，在所有任务中达到了最高30％的绝对改进，平均达到11％。我们的方法对任务资源和领域专业知识的假设最少，因此构成了一种很少的任务敏捷方法。

The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient. We present LM-BFF--better few-shot fine-tuning of language models--a suite of simple and complementary techniques for fine-tuning language models on a small number of annotated examples. Our approach includes (1) prompt-based fine-tuning together with a novel pipeline for automating prompt generation; and (2) a refined strategy for dynamically and selectively incorporating demonstrations into each context. Finally, we present a systematic evaluation for analyzing few-shot performance on a range of NLP tasks, including classification and regression. Our experiments demonstrate that our methods combine to dramatically outperform standard fine-tuning procedures in this low resource setting, achieving up to 30% absolute improvement, and 11% on average across all tasks. Our approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题