论文标题
记忆剥离:在标签噪声下建模神经塌陷
Memorization-Dilation: Modeling Neural Collapse Under Label Noise
论文作者
论文摘要
神经崩溃的概念是指在各种规范分类问题中经验观察到的几种新兴现象。在训练深度神经网络的终端阶段,同一类的所有示例的特征嵌入往往会崩溃为单一表示,而不同类别的特征往往会尽可能分开。通常通过简化的模型(称为无约束的特征表示)来研究神经崩溃,其中假定该模型具有“无限表达性”,并且可以将每个数据点映射到任何任意表示。在这项工作中,我们提出了不受限制的功能表示形式的更现实的变体,该变体考虑了网络的有限表达。经验证据表明,嘈杂数据点的记忆导致神经崩溃的降解(扩张)。使用记忆 - 稀释(M-D)现象的模型,我们展示了一种机制,通过该机制,不同的损失导致嘈杂数据上训练有素的网络的不同性能。我们的证明揭示了为什么标签平滑性(经验观察到产生正则化效应的跨透明镜)的修改导致分类任务的概括改善的原因。
The notion of neural collapse refers to several emergent phenomena that have been empirically observed across various canonical classification problems. During the terminal phase of training a deep neural network, the feature embedding of all examples of the same class tend to collapse to a single representation, and the features of different classes tend to separate as much as possible. Neural collapse is often studied through a simplified model, called the unconstrained feature representation, in which the model is assumed to have "infinite expressivity" and can map each data point to any arbitrary representation. In this work, we propose a more realistic variant of the unconstrained feature representation that takes the limited expressivity of the network into account. Empirical evidence suggests that the memorization of noisy data points leads to a degradation (dilation) of the neural collapse. Using a model of the memorization-dilation (M-D) phenomenon, we show one mechanism by which different losses lead to different performances of the trained network on noisy data. Our proofs reveal why label smoothing, a modification of cross-entropy empirically observed to produce a regularization effect, leads to improved generalization in classification tasks.