STaR: Self-Taught Reasoner
Bootstrapping Reasoning With Reasoning
Eric Zelikman1, Yuhuai Wu12, Jesse Mu1, Noah D. Goodman1
1Department of Computer Science, Stanford University
2Google Research
{ezelikman, yuhuai, muj, ngoodman}@stanford.edu
Abstract
Generating step-by-step "chain-of-thought" rationales improves language model
performance on complex reasoning tasks like mathematics or commonsense
question-answering. However, inducing language model rationale generation cur-
rently requires either constructing massive rationale datasets or sacrificing accuracy
by using only few-shot inference. We propose a technique to iteratively leverage a
small number of rationale examples and a large dataset without rationales, to boot-
strap the ability to perform successively more complex reasoning. This technique,
the "Self-Taught Reasoner" (STaR), relies on a simple loop: generate rationales to
answer many questions, prompted with a few rationale examples; if the generated
answers are wrong, try again to generate a rationale given the correct answer; fine-
tune on all the rationales that ultimately yielded correct answers; repeat. We show
that STaR significantly improves performance on multiple datasets compared to a
model fine-tuned to directly predict final answers, and performs comparably to fine-
tuning a 30larger state-of-the-art language model on CommensenseQA. Thus,
STaR lets a model improve itself by learning from its own generated reasoning.
1 Introduction
Human decision-making is often the result of extended chains of thought [ 1,2]. Recent work has
shown that explicit intermediate reasoning (“rationales”) can improve large language model (LLM)
performance as well [ 3–8]. For example, [ 5] demonstrated that LLMs explicitly trained to use
“scratchpads” for intermediate steps can attain perfect in-distribution performance on arithmetic,
and strong out-of-distribution generalization, while models trained to predict answers directly fail
to do either. These works suggest that generating explicit rationales before giving a final answer
(“rationale generation”) is valuable for LLMs across diverse tasks including mathematical reasoning,
commonsense reasoning, code evaluation, social bias inference, and natural language inference.
However, the two primary methods for inducing rationale generation both have serious drawbacks.
One approach to rationale generation is the construction of a fine-tuning dataset of rationales, either
manually by human annotators or automatically with hand-crafted templates [ 3–5,9]. Manual
methods are expensive, and it is infeasible to construct such a dataset for each interesting problem
[3]. Meanwhile, template-based methods rely on automatically-generated rationales but only work
when a general solution is already known [5] or reasonable hard-coded heuristics can be made [4].
An alternative is to leverage in-context learning by including only a few rationale examples in the
language model prompt. This has been shown to improve accuracy on mathematical and symbolic
reasoning tasks relative to prompting without rationales (“direct” prompting) [ 5,6]. Yet, while few-
shot techniques with rationales tend to outperform their non-reasoning counterparts, they generally
substantially underperform models fine-tuned to directly predict answers using larger datasets [ 5,6].
*These authors contributed equally to this workarXiv:2203.14465v2 [cs.LG] 20 May 2022Question Question , Rationale , Answer
Rationale, Answer Finetune
Hint Rationale, Answer Language
Model
Rationalize
Wrong
Answer Correct
Answer
Rationale
Generation
Q: What can be used
to carry a small dog?
Answer Choices:
(a) swimming pool
(b) basket
(c) dog show
(d) backyard
(e) own home
A: The answer must be
something that can be
used to carry a small
dog. Baskets are
designed to hold things.
Therefore, the answer
is basket (b).
Figure 1: An overview of STaR and a STaR-generated rationale on CommonsenseQA. We indicate
the fine-tuning outer loop with a dash
STaR:自学推理机 通过推理来引导推理e
文档预览
中文文档
30 页
50 下载
1000 浏览
0 评论
309 收藏
3.0分
温馨提示:本文档共30页,可预览 3 页,如浏览全部内容或当前文档出现乱码,可开通会员下载原始文档
本文档由 人生无常 于 2024-10-13 13:52:14上传分享