非机构匹配市场中的分散竞争匪徒

论文标题

非机构匹配市场中的分散竞争匪徒

Decentralized Competing Bandits in Non-Stationary Matching Markets

论文作者

Ghosh, Avishek, Sankararaman, Abishek, Ramchandran, Kannan, Javidi, Tara, Mazumdar, Arya

论文摘要

了解需求端代理与供应方（ARM）竞争的双面在线匹配市场的复杂动态，最近引起了巨大的兴趣。为此，在本文中，我们在非静止（动态）环境下介绍了分散的双向匹配市场的框架。我们遵守串行独裁统治环境，在这种情况下，需求侧代理在供应方（武器）中具有未知和不同的偏好，但武器对代理人具有固定和已知的偏好。我们提出和分析了一种分散的异步学习算法，即分散的非平稳竞争匪徒（\ texttt {dncb}），代理人在其中播放（限制性）连续的消除消除型学习算法，以学习对武器的偏好。理解这种系统的复杂性源于以下事实：竞争的匪徒以异步的方式选择其动作，而排名较低的代理只能从一组武器中学习，而不是较高排名的代理人，而不是\ emph {统治}。使用精心定义的复杂性参数，我们表征了此\ emph {强制探索}，并获得了\ texttt {dncb}的子线性（对数）遗憾。此外，我们通过实验验证了理论发现。

Understanding complex dynamics of two-sided online matching markets, where the demand-side agents compete to match with the supply-side (arms), has recently received substantial interest. To that end, in this paper, we introduce the framework of decentralized two-sided matching market under non stationary (dynamic) environments. We adhere to the serial dictatorship setting, where the demand-side agents have unknown and different preferences over the supply-side (arms), but the arms have fixed and known preference over the agents. We propose and analyze a decentralized and asynchronous learning algorithm, namely Decentralized Non-stationary Competing Bandits (\texttt{DNCB}), where the agents play (restrictive) successive elimination type learning algorithms to learn their preference over the arms. The complexity in understanding such a system stems from the fact that the competing bandits choose their actions in an asynchronous fashion, and the lower ranked agents only get to learn from a set of arms, not \emph{dominated} by the higher ranked agents, which leads to \emph{forced exploration}. With carefully defined complexity parameters, we characterize this \emph{forced exploration} and obtain sub-linear (logarithmic) regret of \texttt{DNCB}. Furthermore, we validate our theoretical findings via experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题