与概率触发的武器或独立武器的组合半伴侣组合的批量大小的独立遗憾界限

论文标题

与概率触发的武器或独立武器的组合半伴侣组合的批量大小的独立遗憾界限

Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms

论文作者

Liu, Xutong, Zuo, Jinhang, Wang, Siwei, Joe-Wong, Carlee, Lui, John C. S., Chen, Wei

论文摘要

在本文中，我们研究了组合半伴侣（CMAB），并着重于减少遗憾界限中批处理大小的$ k $的依赖性，其中$ k $是可以在每回合中拉出或触发的武器总数。首先，对于用概率触发的臂（CMAB-T）设置CMAB，我们发现了一种新颖的（定向）触发概率和方差调制（TPVM）条件，可以替代各种应用程序的平滑度条件，例如级联的basccading Bandits，在线网络探索和在线探索和在线影响最大化。在这种新条件下，我们提出了带有方差感知置信区间的BCUCB-T算法，并进行遗憾分析，将$ O（k）$ cactors降低至$ O（\ log k）$或$ o（\ log^2 k）$（\ log^2 k）$ in Realist Bond的限制，显着改善了上述应用程序的遗憾范围。其次，对于用独立武器的非触发CMAB设置，我们提出了一种SESCB算法，该算法利用了TPVM条件的非触发版本，并完全消除了对$ K $的依赖，以备受遗憾。作为有价值的副产品，本文中使用的遗憾分析可以将几种现有结果提高到$ O（\ log K）$。最后，实验评估表明，与不同应用中的基准算法相比，我们的表现出色。

In this paper, we study the combinatorial semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound, where $K$ is the total number of arms that can be pulled or triggered in each round. First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we discover a novel (directional) triggering probability and variance modulated (TPVM) condition that can replace the previously-used smoothness condition for various applications, such as cascading bandits, online network exploration and online influence maximization. Under this new condition, we propose a BCUCB-T algorithm with variance-aware confidence intervals and conduct regret analysis which reduces the $O(K)$ factor to $O(\log K)$ or $O(\log^2 K)$ in the regret bound, significantly improving the regret bounds for the above applications. Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition and completely removes the dependency on $K$ in the leading regret. As a valuable by-product, the regret analysis used in this paper can improve several existing results by a factor of $O(\log K)$. Finally, experimental evaluations show our superior performance compared with benchmark algorithms in different applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题