了解变压器的记忆通过成语回忆

论文标题

了解变压器的记忆通过成语回忆

Understanding Transformer Memorization Recall Through Idioms

论文作者

Haviv, Adi, Cohen, Ido, Gidron, Jacob, Schuster, Roei, Goldberg, Yoav, Geva, Mor

论文摘要

为了产生准确的预测，语言模型（LMS）必须在概括和记忆之间取得平衡。然而，对于变压器LMS采用其记忆能力的机制知之甚少。模型何时决定输出记忆的短语，然后如何从内存中检索该短语？在这项工作中，我们提供了第一个方法论框架，用于探测和表征变压器LMS中记忆序列的回忆。首先，我们制定了检测触发内存回忆的模型输入的标准，并将成语作为通常满足这些标准的输入。接下来，我们构建一个英语成语的数据集，并使用它来比较记忆与未找到的输入的模型行为。具体而言，我们通过将模型的隐藏表示形式解释为输出概率分布的逐步完善来分析内部预测构建过程。我们发现，在不同的模型大小和体系结构中，记忆的预测是一个两步的过程：早期层将预测令牌促进了输出分布的顶部，并且上层层提高了模型的置信度。这表明记忆的信息存储并检索到网络的早期层中。最后，我们在记忆的事实陈述中证明了我们的方法论超出成语的实用性。总体而言，我们的工作是迈向理解记忆回忆的第一步，并为未来的变压器记忆研究提供了方法论基础。

To produce accurate predictions, language models (LMs) must balance between generalization and memorization. Yet, little is known about the mechanism by which transformer LMs employ their memorization capacity. When does a model decide to output a memorized phrase, and how is this phrase then retrieved from memory? In this work, we offer the first methodological framework for probing and characterizing recall of memorized sequences in transformer LMs. First, we lay out criteria for detecting model inputs that trigger memory recall, and propose idioms as inputs that typically fulfill these criteria. Next, we construct a dataset of English idioms and use it to compare model behavior on memorized vs. non-memorized inputs. Specifically, we analyze the internal prediction construction process by interpreting the model's hidden representations as a gradual refinement of the output probability distribution. We find that across different model sizes and architectures, memorized predictions are a two-step process: early layers promote the predicted token to the top of the output distribution, and upper layers increase model confidence. This suggests that memorized information is stored and retrieved in the early layers of the network. Last, we demonstrate the utility of our methodology beyond idioms in memorized factual statements. Overall, our work makes a first step towards understanding memory recall, and provides a methodological basis for future studies of transformer memorization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题