查找非平滑非凸功能的固定点的复杂性

论文标题

查找非平滑非凸功能的固定点的复杂性

Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions

论文作者

Zhang, Jingzhao, Lin, Hongzhou, Jegelka, Stefanie, Jadbabaie, Ali, Sra, Suvrit

论文摘要

我们提供了第一个非反应分析，用于查找非滑动，非凸函数的固定点。特别是，我们研究了Hadamard半差异函数的类别，也许是微积分链规则所具有的最大一类非平滑函数。该类包含诸如Relu神经网络和其他具有非差异激活功能的示例。我们首先表明，在有限的时间内找到具有一阶方法的$ε$ - 稳定点。然后，我们介绍了$（δ，ε）$ - 固定度的概念，该概念允许$ε$ - 值呈梯度是在距离$δ$内评估的广义梯度的凸组合。我们提出了一系列随机的一阶方法，并分析了它们找到$（δ，ε）$ - 固定点的复杂性。此外，我们提供了一个下限，并表明我们的随机算法对$δ$具有最低最佳依赖性。从经验上讲，我们的方法在培训Relu神经网络方面表现良好。

We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonconvex functions. In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds. This class contains examples such as ReLU neural networks and others with non-differentiable activation functions. We first show that finding an $ε$-stationary point with first-order methods is impossible in finite time. We then introduce the notion of $(δ, ε)$-stationarity, which allows for an $ε$-approximate gradient to be the convex combination of generalized gradients evaluated at points within distance $δ$ to the solution. We propose a series of randomized first-order methods and analyze their complexity of finding a $(δ, ε)$-stationary point. Furthermore, we provide a lower bound and show that our stochastic algorithm has min-max optimal dependence on $δ$. Empirically, our methods perform well for training ReLU neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题