CS1QA：用于在介绍性编程课程中协助基于代码的问题答案的数据集

论文标题

CS1QA：用于在介绍性编程课程中协助基于代码的问题答案的数据集

CS1QA: A Dataset for Assisting Code-based Question Answering in an Introductory Programming Course

论文作者

Lee, Changyoon, Seonwoo, Yeon, Oh, Alice

论文摘要

我们介绍了CS1QA，这是一个用于编程教育领域中基于代码的问题的数据集。 CS1QA由使用Python的介绍性编程类中收集的9,237个问答对组成，以及带有代码的17,698个未经通知的聊天数据。每个问题都伴随学生的代码，以及与回答问题有关的代码部分。我们仔细设计了注释过程来构建CS1QA，并详细分析收集的数据集。 CS1QA的任务是预测问题类型，相关代码段，给定代码和代码，并从注释的语料库中检索答案。报告并彻底分析了几种基线模型实验的结果。 CS1QA的任务挑战模型，以了解代码和自然语言。这个独特的数据集可以用作教育环境中源代码理解和问题答案的基准。

We introduce CS1QA, a dataset for code-based question answering in the programming education domain. CS1QA consists of 9,237 question-answer pairs gathered from chat logs in an introductory programming class using Python, and 17,698 unannotated chat data with code. Each question is accompanied with the student's code, and the portion of the code relevant to answering the question. We carefully design the annotation process to construct CS1QA, and analyze the collected dataset in detail. The tasks for CS1QA are to predict the question type, the relevant code snippet given the question and the code and retrieving an answer from the annotated corpus. Results for the experiments on several baseline models are reported and thoroughly analyzed. The tasks for CS1QA challenge models to understand both the code and natural language. This unique dataset can be used as a benchmark for source code comprehension and question answering in the educational setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题