论文标题
一种梯度复杂性分析,用于最大程度地减少条件数量不同的强凸功能的总和
A Gradient Complexity Analysis for Minimizing the Sum of Strongly Convex Functions with Varying Condition Numbers
论文作者
论文摘要
一种流行的方法来最大程度地减少凸功能的有限和凸函数是随机梯度下降(SGD)及其变体。与SGD相关的基本研究问题包括:(i)在必须评估每个单个函数的梯度甲骨文的次数中,以找到整体目标的$ε$ - 最小化器; (ii)设计算法,以确保在预期的总体目标中找到$ε$ - 毫米拟合,但需要评估每个功能的梯度甲骨文的梯度甲骨文(即,上限)。如果这两个边界处于相同的数量级,则该算法可以称为最佳。沿这条研究线的大多数现有结果通常假定目标中的功能共享相同的条件编号。在本文中,我们研究的第一个模型是最大程度地减少有限的强烈凸数函数的总和,这些函数的状况数字都不同。我们为该模型提出了一种SGD方法,并表明它在梯度计算中是最佳的,最多是对数因素。然后,我们考虑一个受约束的单独的块优化模型,并为其梯度计算复杂性提供下限和上限。接下来,我们建议通过我们之前引入的SGD求解约束块优化模型的Fenchel双重,并表明它比通过ADMM型方法求解原始模型的迭代复杂性低。最后,我们将分析扩展到一般的综合凸优化模型,并在某些条件下获得梯度兼容的复杂性。
A popular approach to minimize a finite-sum of convex functions is stochastic gradient descent (SGD) and its variants. Fundamental research questions associated with SGD include: (i) To find a lower bound on the number of times that the gradient oracle of each individual function must be assessed in order to find an $ε$-minimizer of the overall objective; (ii) To design algorithms which guarantee to find an $ε$-minimizer of the overall objective in expectation at no more than a certain number of times (in terms of $1/ε$) that the gradient oracle of each functions needs to be assessed (i.e., upper bound). If these two bounds are at the same order of magnitude, then the algorithms may be called optimal. Most existing results along this line of research typically assume that the functions in the objective share the same condition number. In this paper, the first model we study is the problem of minimizing the sum of finitely many strongly convex functions whose condition numbers are all different. We propose an SGD method for this model and show that it is optimal in gradient computations, up to a logarithmic factor. We then consider a constrained separate block optimization model, and present lower and upper bounds for its gradient computation complexity. Next, we propose to solve the Fenchel dual of the constrained block optimization model via the SGD we introduced earlier, and show that it yields a lower iteration complexity than solving the original model by the ADMM-type approach. Finally, we extend the analysis to the general composite convex optimization model, and obtain gradient-computation complexity results under certain conditions.