论文标题
与时间相关的镜子的广义镜下降的线性收敛
Linear Convergence of Generalized Mirror Descent with Time-Dependent Mirrors
论文作者
论文摘要
polyak-lojasiewicz(PL)不等式是建立梯度下降的线性收敛的足够条件,即使在非convex设置中也是如此。尽管最近的几项作品使用基于PL的分析来建立随机梯度下降方法的线性收敛性,但对于是否可以进行类似的分析以进行更一般的优化方法,仍然存在问题。在这项工作中,我们提出了基于PL的分析,用于概括(GMD)的线性收敛,这是对镜下降的概括,可能是时间依赖于时间的镜子。 GMD包含流行的一阶优化方法,包括梯度下降,镜像和预处理的梯度下降方法,例如Adagrad。由于标准PL分析不能自然地从GMD扩展到随机GMD,因此我们提出了基于泰勒系列的分析,以建立足够的条件,以使随机GMD的线性收敛性。作为推论,我们的结果建立了足够的条件,并为随机镜下降和Adagrad的线性收敛提供了学习率。最后,对于局部PL*的函数,我们的分析意味着存在插值解决方案和GMD收敛到该解决方案。
The Polyak-Lojasiewicz (PL) inequality is a sufficient condition for establishing linear convergence of gradient descent, even in non-convex settings. While several recent works use a PL-based analysis to establish linear convergence of stochastic gradient descent methods, the question remains as to whether a similar analysis can be conducted for more general optimization methods. In this work, we present a PL-based analysis for linear convergence of generalized mirror descent (GMD), a generalization of mirror descent with a possibly time-dependent mirror. GMD subsumes popular first order optimization methods including gradient descent, mirror descent, and preconditioned gradient descent methods such as Adagrad. Since the standard PL analysis cannot be extended naturally from GMD to stochastic GMD, we present a Taylor-series based analysis to establish sufficient conditions for linear convergence of stochastic GMD. As a corollary, our result establishes sufficient conditions and provides learning rates for linear convergence of stochastic mirror descent and Adagrad. Lastly, for functions that are locally PL*, our analysis implies existence of an interpolating solution and convergence of GMD to this solution.