层次适应的混合效应变压器

论文标题

层次适应的混合效应变压器

Mixed-effects transformers for hierarchical adaptation

论文作者

White, Julia, Goodman, Noah, Hawkins, Robert

论文摘要

语言使用因上下文而异。在某种程度上，像GPT-3这样的现代语言模型能够通过在以前的输入文本或提示的一串上进行条件来考虑这种差异。然而，当上下文稀疏，样本外或文本额外的情况下，提示是无效的；例如，考虑文本的何时何地产生或生产文本。在本文中，我们介绍了混合效应变压器（MET），这是一种学习层次结构的前缀 - 重量级模块的新方法，以说明结构化变化。具体而言，我们展示了如何使用带有辍学的正则化前缀调整过程将流行类别的混合效应模型类别扩展到基于变压器的体系结构。我们在几个领域适应的基准上评估了这种方法，发现它有效地适应了具有最小数据的新颖环境，同时仍然有效地推广到看不见的环境。

Language use differs dramatically from context to context. To some degree, modern language models like GPT-3 are able to account for such variance by conditioning on a string of previous input text, or prompt. Yet prompting is ineffective when contexts are sparse, out-of-sample, or extra-textual; for instance, accounting for when and where the text was produced or who produced it. In this paper, we introduce the mixed-effects transformer (MET), a novel approach for learning hierarchically-structured prefixes -- lightweight modules prepended to the input -- to account for structured variation. Specifically, we show how the popular class of mixed-effects models may be extended to transformer-based architectures using a regularized prefix-tuning procedure with dropout. We evaluate this approach on several domain-adaptation benchmarks, finding that it efficiently adapts to novel contexts with minimal data while still effectively generalizing to unseen contexts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题