论文标题

关于线性频率原理及其概括的确切计算

On the exact computation of linear frequency principle dynamics and its generalization

论文作者

Luo, Tao, Ma, Zheng, Xu, Zhi-Qin John, Zhang, Yaoyu

论文摘要

最近的著作显示了频率原理(F原理)的有趣现象,即深神经网络(DNNS)在训练过程中符合目标功能,从低频到高频,这提供了对复杂任务中DNN的训练和概括行为的见解。在本文中,通过分析神经切线内核(NTK)策略中的无限宽度两层NN,我们得出了精确的微分方程,即线性频率原理(LFP)模型,管理训练期间NN输出功能的演变。我们的确切计算适用于一般激活函数,没有对训练数据的大小和分布的假设。该LFP模型可以根据激活函数的平滑度/规则性,较高的频率比较低的频率逐渐变慢。我们通过证明LFP模型将频率原理(FP-norm)隐含地最小化,从而进一步弥合了训练动力学和概括之间的差距,该函数的频率原理(FP-norm)通过该频率的较高频率,取决于其演化率的倒数。最后,我们得出了由目标函数的FP-norm控制的\ textIt {先验}概括误差,该错误为经验结果提供了理论上的理由,DNN通常对低频函数进行良好的推广。

Recent works show an intriguing phenomenon of Frequency Principle (F-Principle) that deep neural networks (DNNs) fit the target function from low to high frequency during the training, which provides insight into the training and generalization behavior of DNNs in complex tasks. In this paper, through analysis of an infinite-width two-layer NN in the neural tangent kernel (NTK) regime, we derive the exact differential equation, namely Linear Frequency-Principle (LFP) model, governing the evolution of NN output function in the frequency domain during the training. Our exact computation applies for general activation functions with no assumption on size and distribution of training data. This LFP model unravels that higher frequencies evolve polynomially or exponentially slower than lower frequencies depending on the smoothness/regularity of the activation function. We further bridge the gap between training dynamics and generalization by proving that LFP model implicitly minimizes a Frequency-Principle norm (FP-norm) of the learned function, by which higher frequencies are more severely penalized depending on the inverse of their evolution rate. Finally, we derive an \textit{a priori} generalization error bound controlled by the FP-norm of the target function, which provides a theoretical justification for the empirical results that DNNs often generalize well for low frequency functions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源