无尺度的无约束在线学习弯曲损失

论文标题

无尺度的无约束在线学习弯曲损失

Scale-free Unconstrained Online Learning for Curved Losses

论文作者

Mayo, Jack J., Hadiji, Hédi, van Erven, Tim

论文摘要

一系列无约束的在线凸优化中的作品已经调查了同时调整比较器的标准$ u $和梯度的最大规范$ g $的可能性。一般而言，匹配的上限和下限已知，这表明这是不可避免的$ g u^3 $的不可避免的成本，当$ g $或$ u $提前知道时，这是不需要的。令人惊讶的是，Kempka等人的最新结果。（2019年）表明，在特定情况下，不需要这样的适应性价格，例如$ 1 $ -Lipschitz损失（例如铰链损失）。如果我们专门针对其他任何常见的在线学习损失，我们实际上永远不会为适应性付出代价来跟进这一观察结果：我们的结果涵盖了对数的损失，（线性和非参数）逻辑回归，正方形损失预测，以及（线性和非参数）最少型号。我们还通过对$ u $的明确依赖提供匹配的下限来填补文献中的几个空白。在所有情况下，我们都会获得无标度算法，这些算法在数据重新缩放下是适当不变的。我们的总体目标是在不关心计算效率的情况下建立可实现的速率，但是对于线性逻辑回归，我们还提供了一种适应性方法，该方法与Agarwal等人的最新非自适应算法一样有效。（2021）。

A sequence of works in unconstrained online convex optimisation have investigated the possibility of adapting simultaneously to the norm $U$ of the comparator and the maximum norm $G$ of the gradients. In full generality, matching upper and lower bounds are known which show that this comes at the unavoidable cost of an additive $G U^3$, which is not needed when either $G$ or $U$ is known in advance. Surprisingly, recent results by Kempka et al. (2019) show that no such price for adaptivity is needed in the specific case of $1$-Lipschitz losses like the hinge loss. We follow up on this observation by showing that there is in fact never a price to pay for adaptivity if we specialise to any of the other common supervised online learning losses: our results cover log loss, (linear and non-parametric) logistic regression, square loss prediction, and (linear and non-parametric) least-squares regression. We also fill in several gaps in the literature by providing matching lower bounds with an explicit dependence on $U$. In all cases we obtain scale-free algorithms, which are suitably invariant under rescaling of the data. Our general goal is to establish achievable rates without concern for computational efficiency, but for linear logistic regression we also provide an adaptive method that is as efficient as the recent non-adaptive algorithm by Agarwal et al. (2021).

下载PDF全文

下载文献需遵守相关版权规定

论文标题