多项式logit匪徒的完全依赖性依赖性边界

论文标题

多项式logit匪徒的完全依赖性依赖性边界

Fully Gap-Dependent Bounds for Multinomial Logit Bandit

论文作者

Yang, Jiaqi

论文摘要

我们研究了多项式logit（MNL）强盗问题，在每个时间步骤中，卖方在$ n $物品池中提供最多$ k $的大小，而买家根据MNL选择型号从分类中购买商品。目的是学习模型参数并最大程度地提高预期收入。我们提出（i）一种算法，该算法在$ \ widetilde {o}（\ sum_ {i = 1}^nδ_i^{ - 2}）$ time步骤中识别$ \ widetilde {o}（\ sum_ {i = 1} \ log t）$遗憾在$ t $时间步长中。据我们所知，我们的算法是第一个实现GAP依赖性界限的算法，完全取决于所有项目的次级次数差距。我们的技术贡献包括一个将MNL伴侣问题与多臂匪徒中顶级$ K $臂识别问题的变体联系起来的算法框架，基于广义的时期发行程序以及基于层的自适应估计程序。

We study the multinomial logit (MNL) bandit problem, where at each time step, the seller offers an assortment of size at most $K$ from a pool of $N$ items, and the buyer purchases an item from the assortment according to a MNL choice model. The objective is to learn the model parameters and maximize the expected revenue. We present (i) an algorithm that identifies the optimal assortment $S^*$ within $\widetilde{O}(\sum_{i = 1}^N Δ_i^{-2})$ time steps with high probability, and (ii) an algorithm that incurs $O(\sum_{i \notin S^*} KΔ_i^{-1} \log T)$ regret in $T$ time steps. To our knowledge, our algorithms are the first to achieve gap-dependent bounds that fully depends on the suboptimality gaps of all items. Our technical contributions include an algorithmic framework that relates the MNL-bandit problem to a variant of the top-$K$ arm identification problem in multi-armed bandits, a generalized epoch-based offering procedure, and a layer-based adaptive estimation procedure.

下载PDF全文

下载文献需遵守相关版权规定

论文标题