全球库迪卡 - kurdykoujasiewicz不平等的随机优化的急剧分析

论文标题

全球库迪卡 - kurdykoujasiewicz不平等的随机优化的急剧分析

Sharp Analysis of Stochastic Optimization under Global Kurdyka-Łojasiewicz Inequality

论文作者

Fatkhullin, Ilyas, Etesami, Jalal, He, Niao, Kiyavash, Negar

论文摘要

我们研究了当目标函数满足全局kurdyka-lojasiewicz（KL）不平等时，找到全球非凸优化的全局解决方案的复杂性，并且来自随机梯度牙齿的查询满足了轻度的预期平滑度假设。我们首先引入一个通用框架，以分析在设置下的随机梯度下降（SGD）及其相关的非线性动力学。作为我们分析的副产品，当目标满足所谓的$α$ -PL条件时，SGD获得了$ \ mathcal {o}（ε^{ - （4-α）/α}）$的样本复杂性（ε^{ - （4-α）/α}）$。此外，我们表明，当目标满足平均平滑度假设时，具有差异降低和重新启动（PAGER）的修改后的SGD（PAGER）可提高$ \ Mathcal {O}（ε^{ - 2/α}）$的改善样品复杂性。这导致了$α= 1 $的重要情况的第一种最佳算法，该算法出现在应用程序中的应用中，例如强化学习中的策略优化。

We study the complexity of finding the global solution to stochastic nonconvex optimization when the objective function satisfies global Kurdyka-Lojasiewicz (KL) inequality and the queries from stochastic gradient oracles satisfy mild expected smoothness assumption. We first introduce a general framework to analyze Stochastic Gradient Descent (SGD) and its associated nonlinear dynamics under the setting. As a byproduct of our analysis, we obtain a sample complexity of $\mathcal{O}(ε^{-(4-α)/α})$ for SGD when the objective satisfies the so called $α$-PL condition, where $α$ is the degree of gradient domination. Furthermore, we show that a modified SGD with variance reduction and restarting (PAGER) achieves an improved sample complexity of $\mathcal{O}(ε^{-2/α})$ when the objective satisfies the average smoothness assumption. This leads to the first optimal algorithm for the important case of $α=1$ which appears in applications such as policy optimization in reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题