修订变压器：指示语言模型改变其价值

论文标题

修订变压器：指示语言模型改变其价值

Revision Transformers: Instructing Language Models to Change their Values

论文作者

Friedrich, Felix, Stammer, Wolfgang, Schramowski, Patrick, Kersting, Kristian

论文摘要

当前的变压器语言模型（LM）是具有数十亿个参数的大型模型。他们已被证明可以在各种任务上提供高表现，但也容易出现捷径学习和偏见。通过参数调整解决这种错误的模型行为非常昂贵。这对于更新动态概念（例如道德价值观）尤其有问题，这些概念在文化或人际关系上变化。在这项工作中，我们质疑在模型参数中存储所有信息的当前常见实践，并提出修订变压器（RIT），以促进简单的模型更新。大规模预训练的LM的特定组合本质上但也扩散地用清晰的结构化修订引擎编码世界知识，因此可以很少的精力和用户交互的帮助来更新模型的知识。我们在道德数据集中举例说明了RIT，并模拟用户反馈，即使使用较小的数据，也可以在模型修订中表现出强烈的性能。这样，用户可以轻松地设计有关其偏好的模型，为更透明的AI模型铺平道路。

Current transformer language models (LM) are large-scale models with billions of parameters. They have been shown to provide high performances on a variety of tasks but are also prone to shortcut learning and bias. Addressing such incorrect model behavior via parameter adjustments is very costly. This is particularly problematic for updating dynamic concepts, such as moral values, which vary culturally or interpersonally. In this work, we question the current common practice of storing all information in the model parameters and propose the Revision Transformer (RiT) to facilitate easy model updating. The specific combination of a large-scale pre-trained LM that inherently but also diffusely encodes world knowledge with a clear-structured revision engine makes it possible to update the model's knowledge with little effort and the help of user interaction. We exemplify RiT on a moral dataset and simulate user feedback demonstrating strong performance in model revision even with small data. This way, users can easily design a model regarding their preferences, paving the way for more transparent AI models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题