论文标题
安全意识更改点检测分段I.I.D.土匪
Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits
论文作者
论文摘要
在本文中,我们考虑了分段I.I.D.的设置。在安全限制下的土匪。在此分段I.I.D.中设置,存在有限数量的更改点,其中某些或全部武器的平均值同时更改。我们在\ citet {Wu2016 -compertative}中介绍了此设置中研究的安全性约束,以使累积奖励在任何一轮中都超过了默认行动奖励的恒定因素。我们建议在此环境中提出两种积极的自适应算法,以满足安全限制,检测更改点和重新启动,而无需了解更改点或其位置的数量。我们为我们的算法提供了遗憾的界限,并表明界限与安全强盗和分段I.I.D.的界限相媲美。强盗文学。我们还为此设置提供了第一个匹配的下限。从经验上讲,我们表明我们的安全意识算法的性能与不满足安全性限制的最先进的自适应算法相似。
In this paper, we consider the setting of piecewise i.i.d. bandits under a safety constraint. In this piecewise i.i.d. setting, there exists a finite number of changepoints where the mean of some or all arms change simultaneously. We introduce the safety constraint studied in \citet{wu2016conservative} to this setting such that at any round the cumulative reward is above a constant factor of the default action reward. We propose two actively adaptive algorithms for this setting that satisfy the safety constraint, detect changepoints, and restart without the knowledge of the number of changepoints or their locations. We provide regret bounds for our algorithms and show that the bounds are comparable to their counterparts from the safe bandit and piecewise i.i.d. bandit literature. We also provide the first matching lower bounds for this setting. Empirically, we show that our safety-aware algorithms perform similarly to the state-of-the-art actively adaptive algorithms that do not satisfy the safety constraint.