论文标题
Isarstep:高级数学推理的基准
IsarStep: a Benchmark for High-level Mathematical Reasoning
论文作者
论文摘要
定义明确的基准对于测量和加速机器学习模型的研究进度至关重要。在本文中,我们为高级数学推理提供了一个基准,并研究了神经序列与序列模型的推理能力。我们从定理供者人类专家撰写的最大的证明证明存储库中构建了一个非合成数据集。该数据集对本科和研究级数学和计算机科学定理的广泛覆盖范围。在我们确定的任务中,需要模型来填写围绕证明的缺失的中间命题。此任务为使机器自动生成人类可读的证据的长期目标提供了一个起点。我们的实验和分析表明,尽管任务具有挑战性,但神经模型可以捕获非平凡的数学推理。我们进一步设计了一个层次变压器,该变压器的表现优于变压器基线。
A well-defined benchmark is essential for measuring and accelerating research progress of machine learning models. In this paper, we present a benchmark for high-level mathematical reasoning and study the reasoning capabilities of neural sequence-to-sequence models. We build a non-synthetic dataset from the largest repository of proofs written by human experts in a theorem prover. The dataset has a broad coverage of undergraduate and research-level mathematical and computer science theorems. In our defined task, a model is required to fill in a missing intermediate proposition given surrounding proofs. This task provides a starting point for the long-term goal of having machines generate human-readable proofs automatically. Our experiments and analysis reveal that while the task is challenging, neural models can capture non-trivial mathematical reasoning. We further design a hierarchical transformer that outperforms the transformer baseline.