APE210K：数学单词问题的大规模和模板数据集

论文标题

APE210K：数学单词问题的大规模和模板数据集

Ape210K: A Large-Scale and Template-Rich Dataset of Math Word Problems

论文作者

Zhao, Wei, Shang, Mingyue, Liu, Yang, Wang, Liang, Liu, Jingming

论文摘要

近年来，自动数学单词解决问题引起了人们的关注。先前作品使用的评估数据集在规模和多样性方面存在严重的局限性。在本文中，我们发布了一个新的大规模和模板的数学单词问题数据集，名为APE210K。它由210k中国小学级数学问题组成，是最大公共数据集Math23k的9倍。每个问题都包含黄金答案和得出答案所需的方程式。 APE210K的多样性也更大，56K模板是Math23k的25倍。我们的分析表明，解决APE210K不仅需要自然的语言理解，还需要常识性知识。我们希望APE210K成为数学单词问题解决系统的基准。实验表明，在APE210K上，MATH23K数据集上的最新模型的性能较差。我们向序列（SEQ2SEQ）模型提出了一个复制仪和功能增强的序列，该序列在MATH23K数据集中优于现有模型的3.2％，并用作APE210K数据集的强基线。在人类和我们的基线模型之间，差距仍然很大，呼吁进一步的研究工作。我们在https://github.com/yuantiku/ape210k上公开提供APE210K数据集

Automatic math word problem solving has attracted growing attention in recent years. The evaluation datasets used by previous works have serious limitations in terms of scale and diversity. In this paper, we release a new large-scale and template-rich math word problem dataset named Ape210K. It consists of 210K Chinese elementary school-level math problems, which is 9 times the size of the largest public dataset Math23K. Each problem contains both the gold answer and the equations needed to derive the answer. Ape210K is also of greater diversity with 56K templates, which is 25 times more than Math23K. Our analysis shows that solving Ape210K requires not only natural language understanding but also commonsense knowledge. We expect Ape210K to be a benchmark for math word problem solving systems. Experiments indicate that state-of-the-art models on the Math23K dataset perform poorly on Ape210K. We propose a copy-augmented and feature-enriched sequence to sequence (seq2seq) model, which outperforms existing models by 3.2% on the Math23K dataset and serves as a strong baseline of the Ape210K dataset. The gap is still significant between human and our baseline model, calling for further research efforts. We make Ape210K dataset publicly available at https://github.com/yuantiku/ape210k

下载PDF全文

下载文献需遵守相关版权规定

论文标题