汤普森（Thompson

论文标题

汤普森（Thompson

Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints

论文作者

Huang, Zhiming, Xu, Yifan, Hu, Bingshan, Wang, Qipeng, Pan, Jianping

论文摘要

我们研究了长期公平约束〜（CSMAB-F）的组合睡眠多臂半伴侣问题。为了解决这个问题，我们采用汤普森采样〜（TS）来最大化总奖励，并使用虚拟队列技术来处理公平的约束，并设计一种称为\ emph {ts beta priors和bernoulli可能性的算法，用于CSMAB-f〜（tscsf-f〜（tscsf-f〜（tscsf-b））}。此外，我们证明TSCSF-B可以满足公平性的约束，而遗憾的遗憾则由$ \ frac {n} {2η} {2η} + o \ left（\ frac {\ frac {\ sqrt {mnt \ snt {mnt \ ln t}}}}}}} {t} {t} {t} \ right $ n us $ n $ n $ n $ n $，同时在每个回合中（基数约束）和$η$是奖励公平性的参数交易。通过放松公平限制（即，令$η\ rightarrow \ infty $），限制归结为组合睡眠多臂半束缚问题的TS算法的第一个独立的TS算法界限。最后，我们执行数值实验，并使用高级电影推荐应用来显示所提出算法的有效性和效率。

We study the combinatorial sleeping multi-armed semi-bandit problem with long-term fairness constraints~(CSMAB-F). To address the problem, we adopt Thompson Sampling~(TS) to maximize the total rewards and use virtual queue techniques to handle the fairness constraints, and design an algorithm called \emph{TS with beta priors and Bernoulli likelihoods for CSMAB-F~(TSCSF-B)}. Further, we prove TSCSF-B can satisfy the fairness constraints, and the time-averaged regret is upper bounded by $\frac{N}{2η} + O\left(\frac{\sqrt{mNT\ln T}}{T}\right)$, where $N$ is the total number of arms, $m$ is the maximum number of arms that can be pulled simultaneously in each round~(the cardinality constraint) and $η$ is the parameter trading off fairness for rewards. By relaxing the fairness constraints (i.e., let $η\rightarrow \infty$), the bound boils down to the first problem-independent bound of TS algorithms for combinatorial sleeping multi-armed semi-bandit problems. Finally, we perform numerical experiments and use a high-rating movie recommendation application to show the effectiveness and efficiency of the proposed algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题