非covex minimax问题的零级算法提高了复杂性

论文标题

非covex minimax问题的零级算法提高了复杂性

Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities

论文作者

Wang, Zhongruo, Balasubramanian, Krishnakumar, Ma, Shiqian, Razaviyayn, Meisam

论文摘要

在本文中，我们研究了一个最小值优化问题的零级算法，这些算法是一个变量中的非凸，并且在另一个变量中强烈concave。由于它们在现代机器学习任务中的应用，这种最小值优化问题最近引起了极大的关注。我们首先考虑问题的确定性版本。我们设计和分析了零阶梯度下降（\ texttt {zo-gda}）算法，并且与现有作品相比，就甲骨文的复杂性而言提供了改进的结果。我们还提出了零级梯度下降多步上升（\ texttt {zo-gdmsa}）算法，该算法显着提高了\ texttt {zo-gda}的甲骨文复杂性。然后，我们考虑\ texttt {zo-gda}和\ texttt {zo-gdmsa}的随机版本，以处理随机的nonConvex minimax问题。对于这种情况，我们在随机梯度的两个假设下提供了甲骨文的复杂性结果：（i）统一的方差假设，在传统随机优化中很常见，以及（ii）强生长条件（SGC），已知，这已被现代化的过度参数化的机器学习模型所满足。我们确定在SGC假设下，随机算法的复杂性与确定性算法的复杂性相匹配。提出了数值实验以支持我们的理论结果。

In this paper, we study zeroth-order algorithms for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable. Such minimax optimization problems have attracted significant attention lately due to their applications in modern machine learning tasks. We first consider a deterministic version of the problem. We design and analyze the Zeroth-Order Gradient Descent Ascent (\texttt{ZO-GDA}) algorithm, and provide improved results compared to existing works, in terms of oracle complexity. We also propose the Zeroth-Order Gradient Descent Multi-Step Ascent (\texttt{ZO-GDMSA}) algorithm that significantly improves the oracle complexity of \texttt{ZO-GDA}. We then consider stochastic versions of \texttt{ZO-GDA} and \texttt{ZO-GDMSA}, to handle stochastic nonconvex minimax problems. For this case, we provide oracle complexity results under two assumptions on the stochastic gradient: (i) the uniformly bounded variance assumption, which is common in traditional stochastic optimization, and (ii) the Strong Growth Condition (SGC), which has been known to be satisfied by modern over-parametrized machine learning models. We establish that under the SGC assumption, the complexities of the stochastic algorithms match that of deterministic algorithms. Numerical experiments are presented to support our theoretical results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题