论文标题
汤普森(Thompson
Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints
论文作者
论文摘要
我们研究了长期公平约束〜(CSMAB-F)的组合睡眠多臂半伴侣问题。为了解决这个问题,我们采用汤普森采样〜(TS)来最大化总奖励,并使用虚拟队列技术来处理公平的约束,并设计一种称为\ emph {ts beta priors和bernoulli可能性的算法,用于CSMAB-f〜(tscsf-f〜(tscsf-f〜(tscsf-b))}。此外,我们证明TSCSF-B可以满足公平性的约束,而遗憾的遗憾则由$ \ frac {n} {2η} {2η} + o \ left(\ frac {\ frac {\ sqrt {mnt \ snt {mnt \ ln t}}}}}}} {t} {t} {t} \ right $ n us $ n $ n $ n $ n $,同时在每个回合中(基数约束)和$η$是奖励公平性的参数交易。通过放松公平限制(即,令$η\ rightarrow \ infty $),限制归结为组合睡眠多臂半束缚问题的TS算法的第一个独立的TS算法界限。最后,我们执行数值实验,并使用高级电影推荐应用来显示所提出算法的有效性和效率。
We study the combinatorial sleeping multi-armed semi-bandit problem with long-term fairness constraints~(CSMAB-F). To address the problem, we adopt Thompson Sampling~(TS) to maximize the total rewards and use virtual queue techniques to handle the fairness constraints, and design an algorithm called \emph{TS with beta priors and Bernoulli likelihoods for CSMAB-F~(TSCSF-B)}. Further, we prove TSCSF-B can satisfy the fairness constraints, and the time-averaged regret is upper bounded by $\frac{N}{2η} + O\left(\frac{\sqrt{mNT\ln T}}{T}\right)$, where $N$ is the total number of arms, $m$ is the maximum number of arms that can be pulled simultaneously in each round~(the cardinality constraint) and $η$ is the parameter trading off fairness for rewards. By relaxing the fairness constraints (i.e., let $η\rightarrow \infty$), the bound boils down to the first problem-independent bound of TS algorithms for combinatorial sleeping multi-armed semi-bandit problems. Finally, we perform numerical experiments and use a high-rating movie recommendation application to show the effectiveness and efficiency of the proposed algorithm.