理查森外推在机器学习中的有效性

论文标题

理查森外推在机器学习中的有效性

On the Effectiveness of Richardson Extrapolation in Machine Learning

论文作者

Bach, Francis

论文摘要

理查森（Richardson）外推是一种来自数值分析的经典技术，它可以通过结合从其超参数之一的不同值获得的几个估计值来改善估计方法的近似误差，而无需详细了解原始估计方法的内部结构。本文的主要目的是研究何时可以在机器学习中使用理查森外推，而不是现有的应用程序，以便在随机梯度下降中进行逐步适应。我们确定了两种情况，理查森插值可以很有用：（1）当超参数是现有迭代优化算法的迭代次数时，适用于平均梯度下降和Frank-wolfe算法的应用（我们在$ o（1/k^2）$ k $ k^2）$ k ys $ k ys $ k ys ys is y nork and polytopes and the norder（Q）的数量上，正则化参数，应用于Nesterov平滑技术的应用，以最大程度地减少非平滑函数（在其中我们获得接近$ O（1/k^2）$的非平滑函数的渐近率），并回归山脊回归。在所有这些情况下，我们都表明，外推技术没有明显的绩效损失，但有时会有很大的收益，并且我们基于此类收益的渐近发展提供理论理由，以及有关机器学习的经典问题的经验例证。

Richardson extrapolation is a classical technique from numerical analysis that can improve the approximation error of an estimation method by combining linearly several estimates obtained from different values of one of its hyperparameters, without the need to know in details the inner structure of the original estimation method. The main goal of this paper is to study when Richardson extrapolation can be used within machine learning, beyond the existing applications to step-size adaptations in stochastic gradient descent. We identify two situations where Richardson interpolation can be useful: (1) when the hyperparameter is the number of iterations of an existing iterative optimization algorithm, with applications to averaged gradient descent and Frank-Wolfe algorithms (where we obtain asymptotically rates of $O(1/k^2)$ on polytopes, where $k$ is the number of iterations), and (2) when it is a regularization parameter, with applications to Nesterov smoothing techniques for minimizing non-smooth functions (where we obtain asymptotically rates close to $O(1/k^2)$ for non-smooth functions), and ridge regression. In all these cases, we show that extrapolation techniques come with no significant loss in performance, but with sometimes strong gains, and we provide theoretical justifications based on asymptotic developments for such gains, as well as empirical illustrations on classical problems from machine learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题