论文标题
繁重匪徒的最小政策
Minimax Policy for Heavy-tailed Bandits
论文作者
论文摘要
我们研究了最严重的遗憾和重尾奖励分布,研究了随机的多军强盗(MAB)问题。我们通过使用饱和的经验平均值来设计一种称为健壮苔藓的新算法来修改次高斯奖励分布的最小政策苔藓。我们表明,如果存在奖励分配的订单时刻$ 1+ε$,那么精致的策略对匹配下限的遗憾最糟糕,同时保持了与分布相关的对数后悔的遗憾。
We study the stochastic Multi-Armed Bandit (MAB) problem under worst-case regret and heavy-tailed reward distribution. We modify the minimax policy MOSS for the sub-Gaussian reward distribution by using saturated empirical mean to design a new algorithm called Robust MOSS. We show that if the moment of order $1+ε$ for the reward distribution exists, then the refined strategy has a worst-case regret matching the lower bound while maintaining a distribution-dependent logarithm regret.