当句子不引入话语实体时，基于变压器的模型有时仍然是指

论文标题

当句子不引入话语实体时，基于变压器的模型有时仍然是指

When a sentence does not introduce a discourse entity, Transformer-based models still sometimes refer to it

论文作者

Schuster, Sebastian, Linzen, Tal

论文摘要

了解更长的叙述或参加对话需要跟踪提到的话语实体。不确定的名词短语（NP），例如“狗”，经常引入话语实体，但是这种行为是由诸如否定的句子操作员调节的。例如，“亚瑟（Arthur）没有狗”中的“狗”不会因为否定而引入话语实体。在这项工作中，我们将语言模型范式的心理语言评估调整为更高级别的语言现象，并引入了一个英语评估套件，该套件针对句子操作员与不确定的NPS之间的相互作用的知识。我们使用此评估套件对基于变压器的模型GPT-2和GPT-3的实体跟踪能力进行了精细的研究。我们发现，尽管模型在一定程度上对我们研究的相互作用敏感，但它们都受到多个NP的存在而挑战，并且它们的行为不是系统的，这表明即使是GPT-3的模型也不完全获得基本实体跟踪能力。

Understanding longer narratives or participating in conversations requires tracking of discourse entities that have been mentioned. Indefinite noun phrases (NPs), such as 'a dog', frequently introduce discourse entities but this behavior is modulated by sentential operators such as negation. For example, 'a dog' in 'Arthur doesn't own a dog' does not introduce a discourse entity due to the presence of negation. In this work, we adapt the psycholinguistic assessment of language models paradigm to higher-level linguistic phenomena and introduce an English evaluation suite that targets the knowledge of the interactions between sentential operators and indefinite NPs. We use this evaluation suite for a fine-grained investigation of the entity tracking abilities of the Transformer-based models GPT-2 and GPT-3. We find that while the models are to a certain extent sensitive to the interactions we investigate, they are all challenged by the presence of multiple NPs and their behavior is not systematic, which suggests that even models at the scale of GPT-3 do not fully acquire basic entity tracking abilities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题