知识吸引语言模型预处理

论文标题

知识吸引语言模型预处理

Knowledge-Aware Language Model Pretraining

论文作者

Rosset, Corby, Xiong, Chenyan, Phan, Minh, Song, Xia, Bennett, Paul, Tiwary, Saurabh

论文摘要

预验证的语言模型拥有多少知识？最近的研究观察到，经过预定的变压器擅长建模语义，但目前尚不清楚他们掌握人类知识的程度或如何确保它们这样做。在本文中，我们将知识意识纳入语言模型预处理中，而无需更改变压器体系结构，插入明确的知识层或添加语义信息的外部存储。相反，我们只需向实体扩展的令牌仪以训练训练的方式向变压器的输入表明实体的存在；在输出时，具有附加的实体预测任务。我们的实验表明，仅通过预测中添加这些实体信号，将更多的知识包装到变压器参数中：我们观察到提高了语言建模的精度，在喇嘛知识质疑任务中的事实正确性，而隐藏的语义中的事实正确性以及通过边缘探测中的隐性语义。我们还可以表明，我们的知识宣传模型（KALM）可以通过drop-noppers（kalm）来供应，gpts tomests toss toss-town spristers toss toss toss toss self ststr ststr ststr spters town town town town town town town town town town town town selts town splots all没有与任务相关的培训的零射击问题。

How much knowledge do pretrained language models hold? Recent research observed that pretrained transformers are adept at modeling semantics but it is unclear to what degree they grasp human knowledge, or how to ensure they do so. In this paper we incorporate knowledge-awareness in language model pretraining without changing the transformer architecture, inserting explicit knowledge layers, or adding external storage of semantic information. Rather, we simply signal the existence of entities to the input of the transformer in pretraining, with an entity-extended tokenizer; and at the output, with an additional entity prediction task. Our experiments show that solely by adding these entity signals in pretraining, significantly more knowledge is packed into the transformer parameters: we observe improved language modeling accuracy, factual correctness in LAMA knowledge probing tasks, and semantics in the hidden representations through edge probing.We also show that our knowledge-aware language model (KALM) can serve as a drop-in replacement for GPT-2 models, significantly improving downstream tasks like zero-shot question-answering with no task-related training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题