无偏数学单词问题基准缓解解决偏见

论文标题

无偏数学单词问题基准缓解解决偏见

Unbiased Math Word Problems Benchmark for Mitigating Solving Bias

论文作者

Yang, Zhicheng, Qin, Jinghui, Chen, Jiaqi, Liang, Xiaodan

论文摘要

在本文中，我们在评估当前数学单词问题（MWP）基准的模型时重新审视解决偏差。但是，目前的解决者存在解决偏差，该偏差包括数据偏差和由于数据集和不当培训策略而导致的学习偏见。我们的实验验证MWP求解器很容易被偏见的培训数据集偏见，这些数据集并不能涵盖所有MWP的每个问题叙事的各种问题，因此求解器只能学习浅启发式方法，而不是理解问题的深层语义。此外，MWP可以通过多个等效方程自然解决，而当前数据集仅将等效方程之一作为地面真理，迫使模型匹配标记的地面真相并忽略其他等效方程。在这里，我们首先介绍了一个名为unbaribaredMWP的新型MWP数据集，该数据集是通过改变收集到的数据中的接地表达式而构建的，并用手动的多个新问题对它们进行了注释。然后，为了进一步减轻学习偏见，我们提出了一种动态目标选择（DTS）策略，以根据当前模型输出与候选等效方程之间的最长前缀匹配动态选择更合适的目标表达式，这些方程是通过在训练中应用交换法获得的。结果表明，我们的无偏见的偏差明显少于其原始数据和其他数据集，这为公平评估求解者的推理技能而不是与最近的邻居相匹配的有希望的基准。经过DTS训练的求解器在多个MWP基准测试上实现了更高的精度。源代码可在https://github.com/yangzhch6/unbiasedmwp上找到。

In this paper, we revisit the solving bias when evaluating models on current Math Word Problem (MWP) benchmarks. However, current solvers exist solving bias which consists of data bias and learning bias due to biased dataset and improper training strategy. Our experiments verify MWP solvers are easy to be biased by the biased training datasets which do not cover diverse questions for each problem narrative of all MWPs, thus a solver can only learn shallow heuristics rather than deep semantics for understanding problems. Besides, an MWP can be naturally solved by multiple equivalent equations while current datasets take only one of the equivalent equations as ground truth, forcing the model to match the labeled ground truth and ignoring other equivalent equations. Here, we first introduce a novel MWP dataset named UnbiasedMWP which is constructed by varying the grounded expressions in our collected data and annotating them with corresponding multiple new questions manually. Then, to further mitigate learning bias, we propose a Dynamic Target Selection (DTS) Strategy to dynamically select more suitable target expressions according to the longest prefix match between the current model output and candidate equivalent equations which are obtained by applying commutative law during training. The results show that our UnbiasedMWP has significantly fewer biases than its original data and other datasets, posing a promising benchmark for fairly evaluating the solvers' reasoning skills rather than matching nearest neighbors. And the solvers trained with our DTS achieve higher accuracies on multiple MWP benchmarks. The source code is available at https://github.com/yangzhch6/UnbiasedMWP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题