论文标题
多项式logit匪徒的完全依赖性依赖性边界
Fully Gap-Dependent Bounds for Multinomial Logit Bandit
论文作者
论文摘要
我们研究了多项式logit(MNL)强盗问题,在每个时间步骤中,卖方在$ n $物品池中提供最多$ k $的大小,而买家根据MNL选择型号从分类中购买商品。目的是学习模型参数并最大程度地提高预期收入。我们提出(i)一种算法,该算法在$ \ widetilde {o}(\ sum_ {i = 1}^nδ_i^{ - 2})$ time步骤中识别$ \ widetilde {o}(\ sum_ {i = 1} \ log t)$遗憾在$ t $时间步长中。据我们所知,我们的算法是第一个实现GAP依赖性界限的算法,完全取决于所有项目的次级次数差距。我们的技术贡献包括一个将MNL伴侣问题与多臂匪徒中顶级$ K $臂识别问题的变体联系起来的算法框架,基于广义的时期发行程序以及基于层的自适应估计程序。
We study the multinomial logit (MNL) bandit problem, where at each time step, the seller offers an assortment of size at most $K$ from a pool of $N$ items, and the buyer purchases an item from the assortment according to a MNL choice model. The objective is to learn the model parameters and maximize the expected revenue. We present (i) an algorithm that identifies the optimal assortment $S^*$ within $\widetilde{O}(\sum_{i = 1}^N Δ_i^{-2})$ time steps with high probability, and (ii) an algorithm that incurs $O(\sum_{i \notin S^*} KΔ_i^{-1} \log T)$ regret in $T$ time steps. To our knowledge, our algorithms are the first to achieve gap-dependent bounds that fully depends on the suboptimality gaps of all items. Our technical contributions include an algorithmic framework that relates the MNL-bandit problem to a variant of the top-$K$ arm identification problem in multi-armed bandits, a generalized epoch-based offering procedure, and a layer-based adaptive estimation procedure.