比较器自适应凸Bistits

论文标题

比较器自适应凸Bistits

Comparator-adaptive Convex Bandits

论文作者

van der Hoeven, Dirk, Cutkosky, Ashok, Luo, Haipeng

论文摘要

我们研究了适应比较器规范的Bandit凸优化方法，该主题以前仅研究了其全信息对应物。具体而言，我们开发具有后悔界限的凸匪算法，每当比较器的规范很小时，这些算法很小。我们首先使用来自全信息设置的技术来开发用于线性匪徒的比较器自适应算法。然后，我们使用新的单点梯度估计器和精心设计的替代损失来扩展思想，以用Lipschitz或平滑的损失功能凸起。

We study bandit convex optimization methods that adapt to the norm of the comparator, a topic that has only been studied before for its full-information counterpart. Specifically, we develop convex bandit algorithms with regret bounds that are small whenever the norm of the comparator is small. We first use techniques from the full-information setting to develop comparator-adaptive algorithms for linear bandits. Then, we extend the ideas to convex bandits with Lipschitz or smooth loss functions, using a new single-point gradient estimator and carefully designed surrogate losses.

下载PDF全文

下载文献需遵守相关版权规定

论文标题