基于复杂性的提示多步推理

论文标题

基于复杂性的提示多步推理

Complexity-Based Prompting for Multi-Step Reasoning

论文作者

Fu, Yao, Peng, Hao, Sabharwal, Ashish, Clark, Peter, Khot, Tushar

论文摘要

我们研究了促使大规模语言模型执行多步推理的任务。现有工作表明，当提示有一系列思想（COT）时，简短句子的序列描述了最终答案的中间推理步骤时，大语言模型可以生成新的推理链并预测新输入的答案。一个核心问题是哪些推理示例是最有效的提示。在这项工作中，我们提出了基于复杂性的提示，这是一种简单有效的示例选择方案，用于多步推理。我们表明，提示具有更高的推理复杂性，即具有更大的推理步骤的链条，在强大的基线的多步推理任务上实现了更好的性能。我们进一步将基于复杂性的标准从提示（选择输入）到解码（选择输出），从模型中采样多个推理链，然后从复杂的推理链（简单链）中选择大多数生成的答案。当用于提示GPT-3和Codex时，我们的方法基本上提高了多步推理的准确性，并在三个数学基准（GSM8K，Multiarith和Mathqa）和两个BigBenchhard任务（日期了解和企鹅）上获得了三个数学基准（GSM8K，Multiarith和MathQA）的新最先进（SOTA）性能，并具有平均+5.5.3和+5.3.3和提高+18的准确性。与现有的示例选择方案（如手动调整或基于检索的选择）相比，基于推理复杂性的选择是直观的，易于实现的，并且有效的注释。进一步的结果表明，在格式的扰动和分布转移下，复杂提示中的性能提高的稳健性。

We study the task of prompting large-scale language models to perform multi-step reasoning. Existing work shows that when prompted with a chain of thoughts (CoT), sequences of short sentences describing intermediate reasoning steps towards a final answer, large language models can generate new reasoning chains and predict answers for new inputs. A central question is which reasoning examples make the most effective prompts. In this work, we propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning. We show that prompts with higher reasoning complexity, i.e., chains with more reasoning steps, achieve substantially better performance on multi-step reasoning tasks over strong baselines. We further extend our complexity-based criteria from prompting (selecting inputs) to decoding (selecting outputs), where we sample multiple reasoning chains from the model, then choose the majority of generated answers from complex reasoning chains (over simple chains). When used to prompt GPT-3 and Codex, our approach substantially improves multi-step reasoning accuracy and achieves new state-of-the-art (SOTA) performance on three math benchmarks (GSM8K, MultiArith, and MathQA) and two BigBenchHard tasks (Date Understanding and Penguins), with an average +5.3 and up to +18 accuracy improvements. Compared with existing example selection schemes like manual tuning or retrieval-based selection, selection based on reasoning complexity is intuitive, easy to implement, and annotation-efficient. Further results demonstrate the robustness of performance gains from complex prompts under format perturbation and distribution shift.

下载PDF全文

下载文献需遵守相关版权规定

论文标题