论文标题

分析梯度下降学到的过度参数深度神经网络估计的收敛速率

Analysis of the rate of convergence of an over-parametrized deep neural network estimate learned by gradient descent

论文作者

Kohler, Michael, Krzyzak, Adam

论文摘要

考虑了从独立和相同分布的随机变量中回归函数的估计。 $ L_2 $错误与设计措施集成在一起被用作误差标准。定义了过度参数深度神经网络估计值,梯度下降学到了所有权重。结果表明,这些估计值的预期$ L_2 $错误将速率收敛到零,即将接近$ n^{ - 1/(1+d)} $的速率,以防回归函数与Hölder指数$ p \ in [1/2,1] $平滑。在相互作用模型的情况下,假定回归函数是Hölder平滑函数的总和,其中每个函数仅取决于设计变量的$ d^*$许多$ d $组件中的许多$ d $组件中的许多函数,这表明这些估计值实现了相应的$ d^*$ - 二维的融合率。

Estimation of a regression function from independent and identically distributed random variables is considered. The $L_2$ error with integration with respect to the design measure is used as an error criterion. Over-parametrized deep neural network estimates are defined where all the weights are learned by the gradient descent. It is shown that the expected $L_2$ error of these estimates converges to zero with the rate close to $n^{-1/(1+d)}$ in case that the regression function is Hölder smooth with Hölder exponent $p \in [1/2,1]$. In case of an interaction model where the regression function is assumed to be a sum of Hölder smooth functions where each of the functions depends only on $d^*$ many of $d$ components of the design variable, it is shown that these estimates achieve the corresponding $d^*$-dimensional rate of convergence.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源