论文标题
使用大语言模型在Python分配中修复错误
Repairing Bugs in Python Assignments Using Large Language Models
论文作者
论文摘要
学生经常在介绍性编程任务中犯错误,作为学习过程的一部分。不幸的是,为这些错误提供自定义维修可能需要班级讲师的大量时间和精力。自动化程序维修(APR)技术可用于合成此类修复程序。先前的工作探索了在教育领域中使用符号和神经技术的APR。两种方法都需要大量的工程工作或大量数据和培训。我们建议使用经过代码(例如Codex)培训的大型语言模型来构建APR系统-MMMAPR-进行Python入门编程分配。我们的系统可以通过结合多模式提示,迭代查询,基于测试的几个射击选择以及程序分解来解决句法和语义错误。我们在286个真正的学生计划中评估了MMAPR,并与结合了最先进的Python语法维修引擎,BIFI和最先进的Python语义修复引擎进行学生作业的基线相比。我们发现MMAPR可以平均修复更多程序并平均产生较小的补丁。
Students often make mistakes on their introductory programming assignments as part of their learning process. Unfortunately, providing custom repairs for these mistakes can require a substantial amount of time and effort from class instructors. Automated program repair (APR) techniques can be used to synthesize such fixes. Prior work has explored the use of symbolic and neural techniques for APR in the education domain. Both types of approaches require either substantial engineering efforts or large amounts of data and training. We propose to use a large language model trained on code, such as Codex, to build an APR system -- MMAPR -- for introductory Python programming assignments. Our system can fix both syntactic and semantic mistakes by combining multi-modal prompts, iterative querying, test-case-based selection of few-shots, and program chunking. We evaluate MMAPR on 286 real student programs and compare to a baseline built by combining a state-of-the-art Python syntax repair engine, BIFI, and state-of-the-art Python semantic repair engine for student assignments, Refactory. We find that MMAPR can fix more programs and produce smaller patches on average.