双面在线市场的动态匹配强盗

论文标题

双面在线市场的动态匹配强盗

Dynamic Matching Bandit For Two-Sided Online Markets

论文作者

Li, Yuantong, Wang, Chi-hua, Cheng, Guang, Sun, Will Wei

论文摘要

双面在线匹配平台在各个市场中都采用。但是，代理在当前市场中的偏好通常是隐式且未知的，因此需要从数据中学到。随着决策过程中涉及动态侧面信息的越来越多的可用性，现代的在线匹配方法要求能够根据上下文信息跟踪代理的转移偏好。这促使我们为这个动态的在线匹配问题提出了一个新颖的框架，并提供了上下文信息，这允许在匹配决策中进行动态偏好。现有作品专注于与静态偏好的在线匹配，但这是不够的：一方面的上下文信息更新后，双面偏好会更改，从而导致非静态匹配。在本文中，我们提出了一种动态匹配的匪徒算法，以适应此问题。所提出的动态匹配算法的关键组成部分是对具有统计保证的首选项排名的在线估计。从理论上讲，我们表明所提出的动态匹配算法可为代理 - 最佳稳定匹配结果提供高概率。特别是，我们证明了对数遗憾上限$ \ Mathcal {o}（\ log（t））$，并构造了相应的与实例有关的匹配遗憾下限。在实验中，我们证明了动态匹配算法对各种偏好方案，上下文的维度，奖励噪声水平和上下文变化水平以及其在寻求工作的市场中的应用进一步证明了所提出的方法的实际用途。

Two-sided online matching platforms are employed in various markets. However, agents' preferences in the current market are usually implicit and unknown, thus needing to be learned from data. With the growing availability of dynamic side information involved in the decision process, modern online matching methodology demands the capability to track shifting preferences for agents based on contextual information. This motivates us to propose a novel framework for this dynamic online matching problem with contextual information, which allows for dynamic preferences in matching decisions. Existing works focus on online matching with static preferences, but this is insufficient: the two-sided preference changes as soon as one side's contextual information updates, resulting in non-static matching. In this paper, we propose a dynamic matching bandit algorithm to adapt to this problem. The key component of the proposed dynamic matching algorithm is an online estimation of the preference ranking with a statistical guarantee. Theoretically, we show that the proposed dynamic matching algorithm delivers an agent-optimal stable matching result with high probability. In particular, we prove a logarithmic regret upper bound $\mathcal{O}(\log(T))$ and construct a corresponding instance-dependent matching regret lower bound. In the experiments, we demonstrate that dynamic matching algorithm is robust to various preference schemes, dimensions of contexts, reward noise levels, and context variation levels, and its application to a job-seeking market further demonstrates the practical usage of the proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题