通过无限内存的在线凸优化

论文标题

通过无限内存的在线凸优化

Online Convex Optimization with Unbounded Memory

论文作者

Kumar, Raunak, Dean, Sarah, Kleinberg, Robert

论文摘要

在线凸优化（OCO）是在线学习中广泛使用的框架。在每个回合中，学习者在凸组中选择一个决定，而对手选择凸损失功能，然后学习者遭受与当前决策相关的损失。但是，在许多应用程序中，学习者的损失不仅取决于当前的决定，还取决于那个决定的整个决策历史。 OCO框架及其现有的概括并不能捕获这一点，并且只能在一系列近似参数之后将其应用于许多感兴趣的设置。他们还打开了一个问题，即对记忆的依赖是否很紧，因为没有非平凡的下限。在这项工作中，我们介绍了OCO框架的概括，“具有无限内存的在线凸优化”，该框架捕捉了对过去决策的长期依赖。我们介绍了$ p $有效的内存能力，$ h_p $的概念，该概念量化了过去决策对当前损失的最大影响。我们证明了一个$ O（\ sqrt {h_p t}）$上限在策略后悔和匹配（最差的）下限。作为一种特殊情况，我们证明了具有有限内存的OCO的第一个非平凡的下限\ citep {anavahm2015online}，这可能具有独立的兴趣，并且也提高了现有的上限。我们通过使用框架来得出遗憾界限，改善和简化现有的遗憾界限，以证明我们的框架的广泛适用性，以解决包括在线线性控制和在线性能预测的各种在线学习问题。

Online convex optimization (OCO) is a widely used framework in online learning. In each round, the learner chooses a decision in a convex set and an adversary chooses a convex loss function, and then the learner suffers the loss associated with their current decision. However, in many applications the learner's loss depends not only on the current decision but on the entire history of decisions until that point. The OCO framework and its existing generalizations do not capture this, and they can only be applied to many settings of interest after a long series of approximation arguments. They also leave open the question of whether the dependence on memory is tight because there are no non-trivial lower bounds. In this work we introduce a generalization of the OCO framework, "Online Convex Optimization with Unbounded Memory", that captures long-term dependence on past decisions. We introduce the notion of $p$-effective memory capacity, $H_p$, that quantifies the maximum influence of past decisions on present losses. We prove an $O(\sqrt{H_p T})$ upper bound on the policy regret and a matching (worst-case) lower bound. As a special case, we prove the first non-trivial lower bound for OCO with finite memory \citep{anavaHM2015online}, which could be of independent interest, and also improve existing upper bounds. We demonstrate the broad applicability of our framework by using it to derive regret bounds, and to improve and simplify existing regret bound derivations, for a variety of online learning problems including online linear control and an online variant of performative prediction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题