论文标题
正规化事项:过多散热性神经网络的非参数视角
Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network
论文作者
论文摘要
通过梯度下降(GD)训练的过度隔离化神经网络可以证明任何训练数据都过高。但是,嘈杂数据可能无法保证概括。从非参数的角度来看,本文研究了在存在随机噪声的情况下,过度参数化的神经网络如何恢复真正的目标函数。我们对GD迭代的$ L_2 $估计错误建立了一个下限,该迭代距离零距离零,而没有精致的早期停止方案。反过来,通过对$ \ ell_2 $调查的GD轨迹的全面分析,我们证明,对于具有$ \ ell_2 $正则化的过度参数化,我们证明,输出距离相应的神经切线kernel; (2)可以达到$ L_2 $估计错误的minimax {最佳}速率。数值实验证实了我们的理论,并进一步证明了$ \ ell_2 $正则化方法可改善训练的鲁棒性,并为更广泛的神经网络而言。
Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises. We establish a lower bound on the $L_2$ estimation error with respect to the GD iterations, which is away from zero without a delicate scheme of early stopping. In turn, through a comprehensive analysis of $\ell_2$-regularized GD trajectories, we prove that for overparametrized one-hidden-layer ReLU neural network with the $\ell_2$ regularization: (1) the output is close to that of the kernel ridge regression with the corresponding neural tangent kernel; (2) minimax {optimal} rate of $L_2$ estimation error can be achieved. Numerical experiments confirm our theory and further demonstrate that the $\ell_2$ regularization approach improves the training robustness and works for a wider range of neural networks.