学习超越NTK的过度参数化的两层恢复神经网络

论文标题

学习超越NTK的过度参数化的两层恢复神经网络

Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

论文作者

Li, Yuanzhi, Ma, Tengyu, Zhang, Hongyang R.

论文摘要

我们考虑学习两层神经网络的梯度下降的动态。我们假设输入$ x \ in \ mathbb {r}^d $是从高斯分布中得出的，$ x $的标签满足$ f^{\ star}（x）= a^{\ top} | w^{\ star} | w^{\ star} | $ \ in \ Mathbb {r}^{d \ times d} $是正配矩阵。我们表明，通过随机初始化的梯度下降训练的一个过度参数化的两层神经网络可以证明，可以通过多项式样本在多项式时间内最多$ o（1/d）$ o（1/d）$ ofterutheation训练。另一方面，我们证明，包括神经切线内核在内的任何内核方法（包括$ d $中的多项式样本）的人口损失至少$ω（1 / d）$。

We consider the dynamic of gradient descent for learning a two-layer neural network. We assume the input $x\in\mathbb{R}^d$ is drawn from a Gaussian distribution and the label of $x$ satisfies $f^{\star}(x) = a^{\top}|W^{\star}x|$, where $a\in\mathbb{R}^d$ is a nonnegative vector and $W^{\star} \in\mathbb{R}^{d\times d}$ is an orthonormal matrix. We show that an over-parametrized two-layer neural network with ReLU activation, trained by gradient descent from random initialization, can provably learn the ground truth network with population loss at most $o(1/d)$ in polynomial time with polynomial samples. On the other hand, we prove that any kernel method, including Neural Tangent Kernel, with a polynomial number of samples in $d$, has population loss at least $Ω(1 / d)$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题