论文标题

部分可观测时空混沌系统的无模型预测

Omnigrok: Grokking Beyond Algorithmic Data

论文作者

Liu, Ziming, Michaud, Eric J., Tegmark, Max

论文摘要

Grokking是算法数据集的不寻常现象,在过度培训数据后很长一段时间都无法进行概括。我们的目的是通过分析神经网络的损失景观来了解Grokking,从而确定训练和测试损失之间的不匹配是Grokkking的原因。我们将其称为“ LU机制”,因为训练和测试损失(针对模型重量规范)通常类似于“ L”和“ U”。这种简单的机制可以很好地解释Grokking的许多方面:数据尺寸依赖性,重量衰减依赖性,表示形式等。在直觉图片的指导下,我们能够诱导涉及图像,语言和分子的任务。在相反的方向上,我们能够消除算法数据集的Grokking。我们将算法数据集的Grokking的戏剧性归因于表示学习。

Grokking, the unusual phenomenon for algorithmic datasets where generalization happens long after overfitting the training data, has remained elusive. We aim to understand grokking by analyzing the loss landscapes of neural networks, identifying the mismatch between training and test losses as the cause for grokking. We refer to this as the "LU mechanism" because training and test losses (against model weight norm) typically resemble "L" and "U", respectively. This simple mechanism can nicely explain many aspects of grokking: data size dependence, weight decay dependence, the emergence of representations, etc. Guided by the intuitive picture, we are able to induce grokking on tasks involving images, language and molecules. In the reverse direction, we are able to eliminate grokking for algorithmic datasets. We attribute the dramatic nature of grokking for algorithmic datasets to representation learning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源