繁重匪徒的最小政策

论文标题

繁重匪徒的最小政策

Minimax Policy for Heavy-tailed Bandits

论文作者

Wei, Lai, Srivastava, Vaibhav

论文摘要

我们研究了最严重的遗憾和重尾奖励分布，研究了随机的多军强盗（MAB）问题。我们通过使用饱和的经验平均值来设计一种称为健壮苔藓的新算法来修改次高斯奖励分布的最小政策苔藓。我们表明，如果存在奖励分配的订单时刻$ 1+ε$，那么精致的策略对匹配下限的遗憾最糟糕，同时保持了与分布相关的对数后悔的遗憾。

We study the stochastic Multi-Armed Bandit (MAB) problem under worst-case regret and heavy-tailed reward distribution. We modify the minimax policy MOSS for the sub-Gaussian reward distribution by using saturated empirical mean to design a new algorithm called Robust MOSS. We show that if the moment of order $1+ε$ for the reward distribution exists, then the refined strategy has a worst-case regret matching the lower bound while maintaining a distribution-dependent logarithm regret.

下载PDF全文

下载文献需遵守相关版权规定

论文标题