分析梯度下降学到的过度参数深度神经网络估计的收敛速率

论文标题

分析梯度下降学到的过度参数深度神经网络估计的收敛速率

Analysis of the rate of convergence of an over-parametrized deep neural network estimate learned by gradient descent

论文作者

Kohler, Michael, Krzyzak, Adam

论文摘要

考虑了从独立和相同分布的随机变量中回归函数的估计。 $ L_2 $错误与设计措施集成在一起被用作误差标准。定义了过度参数深度神经网络估计值，梯度下降学到了所有权重。结果表明，这些估计值的预期$ L_2 $错误将速率收敛到零，即将接近$ n^{ - 1/（1+d）} $的速率，以防回归函数与Hölder指数$ p \ in [1/2,1] $平滑。在相互作用模型的情况下，假定回归函数是Hölder平滑函数的总和，其中每个函数仅取决于设计变量的$ d^*$许多$ d $组件中的许多$ d $组件中的许多函数，这表明这些估计值实现了相应的$ d^*$ - 二维的融合率。

Estimation of a regression function from independent and identically distributed random variables is considered. The $L_2$ error with integration with respect to the design measure is used as an error criterion. Over-parametrized deep neural network estimates are defined where all the weights are learned by the gradient descent. It is shown that the expected $L_2$ error of these estimates converges to zero with the rate close to $n^{-1/(1+d)}$ in case that the regression function is Hölder smooth with Hölder exponent $p \in [1/2,1]$. In case of an interaction model where the regression function is assumed to be a sum of Hölder smooth functions where each of the functions depends only on $d^*$ many of $d$ components of the design variable, it is shown that these estimates achieve the corresponding $d^*$-dimensional rate of convergence.

下载PDF全文

下载文献需遵守相关版权规定

论文标题