正规化事项：过多散热性神经网络的非参数视角

论文标题

正规化事项：过多散热性神经网络的非参数视角

Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

论文作者

Hu, Tianyang, Wang, Wenjia, Lin, Cong, Cheng, Guang

论文摘要

通过梯度下降（GD）训练的过度隔离化神经网络可以证明任何训练数据都过高。但是，嘈杂数据可能无法保证概括。从非参数的角度来看，本文研究了在存在随机噪声的情况下，过度参数化的神经网络如何恢复真正的目标函数。我们对GD迭代的$ L_2 $估计错误建立了一个下限，该迭代距离零距离零，而没有精致的早期停止方案。反过来，通过对$ \ ell_2 $调查的GD轨迹的全面分析，我们证明，对于具有$ \ ell_2 $正则化的过度参数化，我们证明，输出距离相应的神经切线kernel; （2）可以达到$ L_2 $估计错误的minimax {最佳}速率。数值实验证实了我们的理论，并进一步证明了$ \ ell_2 $正则化方法可改善训练的鲁棒性，并为更广泛的神经网络而言。

Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises. We establish a lower bound on the $L_2$ estimation error with respect to the GD iterations, which is away from zero without a delicate scheme of early stopping. In turn, through a comprehensive analysis of $\ell_2$-regularized GD trajectories, we prove that for overparametrized one-hidden-layer ReLU neural network with the $\ell_2$ regularization: (1) the output is close to that of the kernel ridge regression with the corresponding neural tangent kernel; (2) minimax {optimal} rate of $L_2$ estimation error can be achieved. Numerical experiments confirm our theory and further demonstrate that the $\ell_2$ regularization approach improves the training robustness and works for a wider range of neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题