论文标题
部分训练的三层神经网络的功能空间平均场理论
A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer Neural Networks
论文作者
论文摘要
为了了解神经网络(NNS)的训练动力学,先前的研究已经考虑了两层NN的无限宽度平均场(MF)限制,从而在梯度流训练以及其近似和概括能力下建立了其收敛的理论保证。在这项工作中,我们研究了第一层是随机且固定的三层NN模型的无限宽度极限。为了严格定义限制模型,我们通过将神经元视为属于功能空间的神经元来概括两层NN的MF理论。然后,通过将MF训练动力学作为内核梯度流,其持续变化的内核,我们证明其在$ L_2 $回归中的训练损失以线性速率降至零。此外,我们定义了功能空间,其中包括可通过MF训练动力学获得的解决方案,并证明这些空间的Rademacher复杂性界限。我们的理论适应模型的不同缩放选择,从而产生了MF限制的两个制度,这些制度表现出独特的行为,同时均表现出特征学习。
To understand the training dynamics of neural networks (NNs), prior studies have considered the infinite-width mean-field (MF) limit of two-layer NN, establishing theoretical guarantees of its convergence under gradient flow training as well as its approximation and generalization capabilities. In this work, we study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed. To define the limiting model rigorously, we generalize the MF theory of two-layer NNs by treating the neurons as belonging to functional spaces. Then, by writing the MF training dynamics as a kernel gradient flow with a time-varying kernel that remains positive-definite, we prove that its training loss in $L_2$ regression decays to zero at a linear rate. Furthermore, we define function spaces that include the solutions obtainable through the MF training dynamics and prove Rademacher complexity bounds for these spaces. Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors while both exhibiting feature learning.