论文标题
汤普森在匹配市场中进行强盗学习的抽样
Thompson Sampling for Bandit Learning in Matching Markets
论文作者
论文摘要
双面匹配市场的问题具有广泛的现实应用程序,并且在文献中进行了广泛的研究。最近的一系列作品集中在问题设置上,其中一侧市场参与者的偏好是未知的\ emph {a先验},并且通过与参与者的另一端进行迭代互动而学习。所有这些作品均基于探索 - 然后使用(ETC)和上置信度结合(UCB)算法,这是多臂匪(MAB)中的两种常见策略。汤普森采样(TS)是另一种流行的方法,由于其更容易实施和更好的经验表现,它引起了很多关注。在许多问题中,即使已经分析了UCB和ETC-type算法,研究人员仍在尝试研究TS的好处。但是,TS的收敛分析更具挑战性,并且在许多问题设置中仍然开放。在本文中,我们在迭代匹配市场的新环境中为TS提供了第一个遗憾分析。广泛的实验证明了TS型算法比ETC和UCB型基线的实际优势。
The problem of two-sided matching markets has a wide range of real-world applications and has been extensively studied in the literature. A line of recent works have focused on the problem setting where the preferences of one-side market participants are unknown \emph{a priori} and are learned by iteratively interacting with the other side of participants. All these works are based on explore-then-commit (ETC) and upper confidence bound (UCB) algorithms, two common strategies in multi-armed bandits (MAB). Thompson sampling (TS) is another popular approach, which attracts lots of attention due to its easier implementation and better empirical performances. In many problems, even when UCB and ETC-type algorithms have already been analyzed, researchers are still trying to study TS for its benefits. However, the convergence analysis of TS is much more challenging and remains open in many problem settings. In this paper, we provide the first regret analysis for TS in the new setting of iterative matching markets. Extensive experiments demonstrate the practical advantages of the TS-type algorithm over the ETC and UCB-type baselines.