连续的随机梯度方法：第一部分 - 收敛理论

论文标题

连续的随机梯度方法：第一部分 - 收敛理论

The Continuous Stochastic Gradient Method: Part I -- Convergence Theory

论文作者

Grieshammer, Max, Pflug, Lukas, Stingl, Michael, Uihlein, Andrian

论文摘要

在此贡献中，我们介绍了连续随机梯度（CSG）方法的完整概述，包括收敛结果，步长规则和算法见解。我们考虑了目标函数需要某种形式集成的优化问题，例如预期值。由于近似固定正交规则的集成可以将人工局部解决方案引入问题，同时提高计算工作，因此随机优化方案在这种情况下变得越来越流行。但是，已知的随机梯度类型方法通常仅限于预期的风险功能，并且本质上需要许多迭代。如果成本函数的评估涉及求解多个状态方程，例如以部分微分方程的形式求解了多个状态方程，则后者尤其有问题。为了克服这些缺点，最近的一篇文章介绍了CSG方法，该方法通过计算依赖设计的集成权重以更好地近似为完整梯度。在原始的CSG纸张收敛中，以减小的步长建立，但在这里，我们为CSG提供了CSG的完整收敛分析，用于恒定步骤尺寸和Armijo-type线路搜索。此外，提出了获得集成权重的新方法，将CSG的应用范围扩展到涉及较高维积分和分布式数据的问题。

In this contribution, we present a full overview of the continuous stochastic gradient (CSG) method, including convergence results, step size rules and algorithmic insights. We consider optimization problems in which the objective function requires some form of integration, e.g., expected values. Since approximating the integration by a fixed quadrature rule can introduce artificial local solutions into the problem while simultaneously raising the computational effort, stochastic optimization schemes have become increasingly popular in such contexts. However, known stochastic gradient type methods are typically limited to expected risk functions and inherently require many iterations. The latter is particularly problematic, if the evaluation of the cost function involves solving multiple state equations, given, e.g., in form of partial differential equations. To overcome these drawbacks, a recent article introduced the CSG method, which reuses old gradient sample information via the calculation of design dependent integration weights to obtain a better approximation to the full gradient. While in the original CSG paper convergence of a subsequence was established for a diminishing step size, here, we provide a complete convergence analysis of CSG for constant step sizes and an Armijo-type line search. Moreover, new methods to obtain the integration weights are presented, extending the application range of CSG to problems involving higher dimensional integrals and distributed data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题