在类似插值状态下，逃脱鞍点的速度更快

论文标题

在类似插值状态下，逃脱鞍点的速度更快

Escaping Saddle-Points Faster under Interpolation-like Conditions

论文作者

Roy, Abhishek, Balasubramanian, Krishnakumar, Ghadimi, Saeed, Mohapatra, Prasant

论文摘要

在本文中，我们表明，在过度参数化下，几种标准的随机优化算法逃脱了鞍点，并将其收敛到局部少量剂的速度更快。过度参数化模型的基本方面之一是它们能够插值训练数据。我们表明，在过度参数化设置中随机梯度满足的类似插值的假设下，扰动的随机梯度下降（PSGD）算法的一阶甲骨文复杂性达到$ε$ -Local-localiminimizer，匹配相应的确定性确定性速率。 $ \ tilde {\ Mathcal {o}}（1/ε^{2}）$。接下来，我们将在类似插值的条件下分析随机调节的牛顿（SCRN）算法，并表明在类似插值条件下达到$ε$ - 属于$ε$ -Local-localimizer的复杂性为$ \ tilde {\ tilde {\ Mathcal {\ Mathcal {o}}（O}}（1/ε^2.5}）$。尽管这种获得的复杂性优于PSGD的相应复杂性或没有类似插值假设的SCRN，但它与$ \ tilde {\ Mathcal {o}}}}（1/ε^{1.5}）$不符合确定性的codgular-codgularized newton方法。似乎进一步基于黑森西亚的插值样假设是弥合这一差距。我们还讨论了零级设置中相应的改进复杂性。

In this paper, we show that under over-parametrization several standard stochastic optimization algorithms escape saddle-points and converge to local-minimizers much faster. One of the fundamental aspects of over-parametrized models is that they are capable of interpolating the training data. We show that, under interpolation-like assumptions satisfied by the stochastic gradients in an over-parametrization setting, the first-order oracle complexity of Perturbed Stochastic Gradient Descent (PSGD) algorithm to reach an $ε$-local-minimizer, matches the corresponding deterministic rate of $\tilde{\mathcal{O}}(1/ε^{2})$. We next analyze Stochastic Cubic-Regularized Newton (SCRN) algorithm under interpolation-like conditions, and show that the oracle complexity to reach an $ε$-local-minimizer under interpolation-like conditions, is $\tilde{\mathcal{O}}(1/ε^{2.5})$. While this obtained complexity is better than the corresponding complexity of either PSGD, or SCRN without interpolation-like assumptions, it does not match the rate of $\tilde{\mathcal{O}}(1/ε^{1.5})$ corresponding to deterministic Cubic-Regularized Newton method. It seems further Hessian-based interpolation-like assumptions are necessary to bridge this gap. We also discuss the corresponding improved complexities in the zeroth-order settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题