随机近端梯度算法，带有小匹配。应用大规模学习模型

论文标题

随机近端梯度算法，带有小匹配。应用大规模学习模型

Stochastic Proximal Gradient Algorithm with Minibatches. Application to Large Scale Learning Models

论文作者

Patrascu, Andrei, Paduraru, Ciprian, Irofti, Paul

论文摘要

随机优化是大多数统计学习模型的核心。随机算法工具的最新发展大量集中在近端梯度迭代上，以找到一种有效的方法（复合）种群风险功能。通过最小化正规风险来查找最佳预测因素的复杂性在很大程度上是为了简单的正规化（例如$ \ ell_1/\ ell_2 $ norms）。但是，对于预测变量而言，需要更复杂的特性，需要在分组的套索或图形趋势滤波中使用的非常困难的正则化。在本章中，我们开发和分析了随机复合物镜的随机近端梯度算法的Minibatch变体，并使用随机的非滑动组件。我们为$ \ MATHCAL {O}（\ frac {1} {1} {nε}）$迭代$ to $ε-$ suboftimation在预期的Quadratic toexratic toe to to-tigratic to to Pumptal tose上获得了$ \ Mathcal {o}（\ frac {1} {nε}）$ to to to to to to Overal fimust解决方案。 $ \ ell_2- $正则化SVM和参数稀疏表示问题的数值测试确认了理论行为并超过Minibatch SGD性能。

Stochastic optimization lies at the core of most statistical learning models. The recent great development of stochastic algorithmic tools focused significantly onto proximal gradient iterations, in order to find an efficient approach for nonsmooth (composite) population risk functions. The complexity of finding optimal predictors by minimizing regularized risk is largely understood for simple regularizations such as $\ell_1/\ell_2$ norms. However, more complex properties desired for the predictor necessitates highly difficult regularizers as used in grouped lasso or graph trend filtering. In this chapter we develop and analyze minibatch variants of stochastic proximal gradient algorithm for general composite objective functions with stochastic nonsmooth components. We provide iteration complexity for constant and variable stepsize policies obtaining that, for minibatch size $N$, after $\mathcal{O}(\frac{1}{Nε})$ iterations $ε-$suboptimality is attained in expected quadratic distance to optimal solution. The numerical tests on $\ell_2-$regularized SVMs and parametric sparse representation problems confirm the theoretical behaviour and surpasses minibatch SGD performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题