论文标题
变形金刚发现一个基本计算系统,利用当地注意力和网格样问题表示
Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation
论文作者
论文摘要
数学推理是人类智力最令人印象深刻的成就之一,但对于人工智能系统仍然是一个巨大的挑战。在这项工作中,我们探讨了现代深度学习体系结构是否可以通过发现有效的算术程序来学会解决象征性的增加任务。尽管乍看之下这个问题似乎似乎很容易,但将算术知识推广到涉及较高术语的操作(可能由较长的数字序列组成)已被证明对神经网络而言非常具有挑战性。在这里,我们表明,配备了局部注意力和自适应停止机制的通用变压器可以学会利用外部,网格样的内存来进行多位数。即使经过需要在训练分布之外推断的问题,提出的模型也达到了显着的准确性。最值得注意的是,它通过发现类似人类的计算策略(例如位置价值对齐)来做到这一点。
Mathematical reasoning is one of the most impressive achievements of human intellect but remains a formidable challenge for artificial intelligence systems. In this work we explore whether modern deep learning architectures can learn to solve a symbolic addition task by discovering effective arithmetic procedures. Although the problem might seem trivial at first glance, generalizing arithmetic knowledge to operations involving a higher number of terms, possibly composed by longer sequences of digits, has proven extremely challenging for neural networks. Here we show that universal transformers equipped with local attention and adaptive halting mechanisms can learn to exploit an external, grid-like memory to carry out multi-digit addition. The proposed model achieves remarkable accuracy even when tested with problems requiring extrapolation outside the training distribution; most notably, it does so by discovering human-like calculation strategies such as place value alignment.