是什么使指导学习很难？在合成环境中的调查和新挑战

论文标题

是什么使指导学习很难？在合成环境中的调查和新挑战

What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment

论文作者

Finlayson, Matthew, Richardson, Kyle, Sabharwal, Ashish, Clark, Peter

论文摘要

教学学习范式（模型学会单独从任务描述执行新任务）在通用模型研究中变得很流行。但是，大型变压器模型作为指导学习者的功能仍然很少理解。我们使用受控的合成环境来表征此类功能。具体来说，我们使用决定给定字符串是否匹配正则表达式（视为指令）以确定任务，指令和实例的属性，使指导学习具有挑战性。例如，我们发现我们的模型是一种基于T5的基于T5的Text2Text变压器，与大型普通语言进行斗争，这表明较少精确的说明对于模型而言是具有挑战性的。此外，需要跟踪更长的先前步骤上下文的指令执行也更加困难。我们使用我们的发现系统地构建一个具有挑战性的指导学习数据集，我们称之为硬重点。对硬重点进行微调，我们的大型变压器学会了仅正确解释65.6％的测试说明（至少具有90％的精度），而在分布范围内的通用设置中，指令的11％-24％。我们建议硬重点作为一项具有挑战性的教学学习任务，以及用于学习教学学习的受控环境。

The instruction learning paradigm -- where a model learns to perform new tasks from task descriptions alone -- has become popular in general-purpose model research. The capabilities of large transformer models as instruction learners, however, remain poorly understood. We use a controlled synthetic environment to characterize such capabilities. Specifically, we use the task of deciding whether a given string matches a regular expression (viewed as an instruction) to identify properties of tasks, instructions, and instances that make instruction learning challenging. For instance, we find that our model, a fine-tuned T5-based text2text transformer, struggles with large regular languages, suggesting that less precise instructions are challenging for models. Additionally, instruction executions that require tracking longer contexts of prior steps are also more difficult. We use our findings to systematically construct a challenging instruction learning dataset, which we call Hard RegSet. Fine-tuning on Hard RegSet, our large transformer learns to correctly interpret only 65.6% of test instructions (with at least 90% accuracy), and 11%-24% of the instructions in out-of-distribution generalization settings. We propose Hard RegSet as a challenging instruction learning task, and a controlled environment for studying instruction learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题