论文标题
sc-ques:第二语言学习者的句子完成问题数据集
SC-Ques: A Sentence Completion Question Dataset for English as a Second Language Learners
论文作者
论文摘要
句子完成(SC)问题提出了一个或多个需要填写的空白,三到五个可能的单词或短语作为选项。 SC问题被广泛用于学习英语作为第二语言(ESL)的学生。在本文中,我们提出了一个大规模的SC数据集,\ textsc {sc-ques},该数据由289,148 ESL SC的问题组成,来自现实世界中标准化英语考试。此外,我们通过在提出的\ textsc {sc-ques}数据集中训练大规模的预训练的语言模型来自动解决SC问题的全面基准。我们对基线模型的性能,限制和权衡进行详细分析。数据和我们的代码可用于研究目的:\ url {https://github.com/ai4ed/sc-ques}。
Sentence completion (SC) questions present a sentence with one or more blanks that need to be filled in, three to five possible words or phrases as options. SC questions are widely used for students learning English as a Second Language (ESL). In this paper, we present a large-scale SC dataset, \textsc{SC-Ques}, which is made up of 289,148 ESL SC questions from real-world standardized English examinations. Furthermore, we build a comprehensive benchmark of automatically solving the SC questions by training the large-scale pre-trained language models on the proposed \textsc{SC-Ques} dataset. We conduct detailed analysis of the baseline models performance, limitations and trade-offs. The data and our code are available for research purposes from: \url{https://github.com/ai4ed/SC-Ques}.