变形金刚知道符号规则吗？我们是否知道他们是否做到了？

论文标题

变形金刚知道符号规则吗？我们是否知道他们是否做到了？

Do Transformers know symbolic rules, and would we know if they did?

论文作者

Gröndahl, Tommi, Guo, Yujia, Asokan, N.

论文摘要

为了提高NLP中领先的变压器网络的解释性，将真正的符号规则从仅仅是关联输入输出模式中分开很重要。但是，我们确定了在最近的NLP文献中如何解释``''''''''''''''''''''''''''''''的几种不一致之处。为了减轻此问题，我们提出两个标准是最相关的，一个标准与系统的内部体系结构有关，另一个与抽象规则和特定输入身份之间的分离有关。从这个角度来看，我们对变压器的象征能力进行了批判性研究，并认为结果从根本上尚无定论，因为实验设计固有的原因。我们进一步坚持认为，在所有端到端的设置中都会出现这个问题，因为它在某种程度上出现了。尽管如此，我们强调需要对是否存在在看似象征性的任务中成功的非符号解释进行更强大的评估。为了促进这一点，我们在两个实验设置中对T5变压器上的四个序列建模任务进行了实验：零射门的概括，以及在训练和测试集之间翻转的特定类词汇的概括。我们观察到，在序列到序列任务中，T5的概括比在可比的分类任务中要强。基于此，我们提出了迄今为止被忽视的分析，其中变压器本身不需要象征性地成为符号体系结构的一部分，作为处理器，在输入和输出作为外部内存组件上运行。

To improve the explainability of leading Transformer networks used in NLP, it is important to tease apart genuine symbolic rules from merely associative input-output patterns. However, we identify several inconsistencies in how ``symbolicity'' has been construed in recent NLP literature. To mitigate this problem, we propose two criteria to be the most relevant, one pertaining to a system's internal architecture and the other to the dissociation between abstract rules and specific input identities. From this perspective, we critically examine prior work on the symbolic capacities of Transformers, and deem the results to be fundamentally inconclusive for reasons inherent in experiment design. We further maintain that there is no simple fix to this problem, since it arises -- to an extent -- in all end-to-end settings. Nonetheless, we emphasize the need for more robust evaluation of whether non-symbolic explanations exist for success in seemingly symbolic tasks. To facilitate this, we experiment on four sequence modelling tasks on the T5 Transformer in two experiment settings: zero-shot generalization, and generalization across class-specific vocabularies flipped between the training and test set. We observe that T5's generalization is markedly stronger in sequence-to-sequence tasks than in comparable classification tasks. Based on this, we propose a thus far overlooked analysis, where the Transformer itself does not need to be symbolic to be part of a symbolic architecture as the processor, operating on the input and output as external memory components.

下载PDF全文

下载文献需遵守相关版权规定

论文标题