说明:收录全球国际标准 提供单次或批量下载
STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning Eric Zelikman1, Yuhuai Wu12, Jesse Mu1, Noah D. Goodman1 1Department of Computer Science, Stanford University 2Google Research {ezelikman, yuhuai, muj, ngoodman}@stanford.edu Abstract Generating step-by-step "chain-of-thought" rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation cur- rently requires either constructing massive rationale datasets or sacrificing accuracy by using only few-shot inference. We propose a technique to iteratively leverage a small number of rationale examples and a large dataset without rationales, to boot- strap the ability to perform successively more complex reasoning. This technique, the "Self-Taught Reasoner" (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine- tune on all the rationales that ultimately yielded correct answers; repeat. We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers, and performs comparably to fine- tuning a 30larger state-of-the-art language model on CommensenseQA. Thus, STaR lets a model improve itself by learning from its own generated reasoning. 1 Introduction Human decision-making is often the result of extended chains of thought [ 1,2]. Recent work has shown that explicit intermediate reasoning (“rationales”) can improve large language model (LLM) performance as well [ 3–8]. For example, [ 5] demonstrated that LLMs explicitly trained to use “scratchpads” for intermediate steps can attain perfect in-distribution performance on arithmetic, and strong out-of-distribution generalization, while models trained to predict answers directly fail to do either. These works suggest that generating explicit rationales before giving a final answer (“rationale generation”) is valuable for LLMs across diverse tasks including mathematical reasoning, commonsense reasoning, code evaluation, social bias inference, and natural language inference. However, the two primary methods for inducing rationale generation both have serious drawbacks. One approach to rationale generation is the construction of a fine-tuning dataset of rationales, either manually by human annotators or automatically with hand-crafted templates [ 3–5,9]. Manual methods are expensive, and it is infeasible to construct such a dataset for each interesting problem [3]. Meanwhile, template-based methods rely on automatically-generated rationales but only work when a general solution is already known [5] or reasonable hard-coded heuristics can be made [4]. An alternative is to leverage in-context learning by including only a few rationale examples in the language model prompt. This has been shown to improve accuracy on mathematical and symbolic reasoning tasks relative to prompting without rationales (“direct” prompting) [ 5,6]. Yet, while few- shot techniques with rationales tend to outperform their non-reasoning counterparts, they generally substantially underperform models fine-tuned to directly predict answers using larger datasets [ 5,6]. *These authors contributed equally to this workarXiv:2203.14465v2 [cs.LG] 20 May 2022Question Question , Rationale , Answer Rationale, Answer Finetune Hint Rationale, Answer Language Model Rationalize Wrong Answer Correct Answer Rationale Generation Q: What can be used to carry a small dog? Answer Choices: (a) swimming pool (b) basket (c) dog show (d) backyard (e) own home A: The answer must be something that can be used to carry a small dog. Baskets are designed to hold things. Therefore, the answer is basket (b). Figure 1: An overview of STaR and a STaR-generated rationale on CommonsenseQA. We indicate the fine-tuning outer loop with a dash

.pdf文档 STaR:自学推理机 通过推理来引导推理e

文档预览
中文文档 30 页 50 下载 1000 浏览 0 评论 309 收藏 3.0分
温馨提示:本文档共30页,可预览 3 页,如浏览全部内容或当前文档出现乱码,可开通会员下载原始文档
STaR:自学推理机 通过推理来引导推理e 第 1 页 STaR:自学推理机 通过推理来引导推理e 第 2 页 STaR:自学推理机 通过推理来引导推理e 第 3 页
下载文档到电脑,方便使用
本文档由 人生无常 于 2024-10-13 13:52:14上传分享
站内资源均来自网友分享或网络收集整理,若无意中侵犯到您的权利,敬请联系我们微信(点击查看客服),我们将及时删除相关资源。