在某些神经网络培训问题中，关于伪造的本地最小值的无处不在

论文标题

在某些神经网络培训问题中，关于伪造的本地最小值的无处不在

On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems

论文作者

Christof, Constantin, Kowalczyk, Julia

论文摘要

我们研究了具有一维实际输出的深人造神经网络训练问题的损失格局，其激活功能包含一个仿射段，其隐藏层至少具有两个。结果表明，对于所有不是仿射的目标功能，此类问题具有伪造的（即不是全球最佳）局部最小值的连续性。与以前的工作相反，我们的分析涵盖了所有采样和参数化制度，一般可区分的损失函数，任意连续的非物质激活函数以及有限和无限维度设置。进一步表明，在考虑的训练问题中，虚假的局部最小值的外观是通用近似定理的直接结果，并且基本机制也引起，例如，$ l^p $最终的近似问题是因为所有没有密集图像的网络而言，就哈达玛而言，要遭受不足的意义。后者的结果也没有假设，没有局部仿射线性性，并且在隐藏层上没有任何条件。

We study the loss landscape of training problems for deep artificial neural networks with a one-dimensional real output whose activation functions contain an affine segment and whose hidden layers have width at least two. It is shown that such problems possess a continuum of spurious (i.e., not globally optimal) local minima for all target functions that are not affine. In contrast to previous works, our analysis covers all sampling and parameterization regimes, general differentiable loss functions, arbitrary continuous nonpolynomial activation functions, and both the finite- and infinite-dimensional setting. It is further shown that the appearance of the spurious local minima in the considered training problems is a direct consequence of the universal approximation theorem and that the underlying mechanisms also cause, e.g., $L^p$-best approximation problems to be ill-posed in the sense of Hadamard for all networks that do not have a dense image. The latter result also holds without the assumption of local affine linearity and without any conditions on the hidden layers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题