在与非侧心代理商的Stackelberg游戏中学习

论文标题

在与非侧心代理商的Stackelberg游戏中学习

Learning in Stackelberg Games with Non-myopic Agents

论文作者

Haghtalab, Nika, Lykouris, Thodoris, Nietert, Sloan, Wei, Alexander

论文摘要

我们研究了Stackelberg游戏，其中一位校长反复与非洋流长期的代理商进行互动，而不知道代理商的回报功能。尽管当代理商是近视时，在Stackelberg游戏中的学习是可以很好地理解的，但与非侧型代理商打交道会带来额外的并发症。尤其是，非侧心代理可以制定战略和选择当前劣等的行动，以误导校长的学习算法并在未来获得更好的结果。我们提供了一个通用框架，该框架可在存在近视剂的情况下降低非侧型剂的学习以优化强大的匪徒。通过设计和分析微型反应性匪徒算法，我们的还原从校长学习算法的统计效率中进行了差异，以与其在诱导接近最佳的响应中的有效性。我们将此框架应用于Stackelberg Security Games（SSGS），需求曲线未知的价格，一般有限的Stackelberg游戏和战略分类。在每种情况下，我们都表征了近乎最佳响应中存在错误的类型和影响，并为此类拼写错误开发了一种学习算法。在途中，我们通过发现这些游戏的基本结构属性，从$ o（n^3）$转化为$ o（n^3）$的SSG学习的最新查询复杂性。后者的结果是与非侧心药物学习之外的独立兴趣。

We study Stackelberg games where a principal repeatedly interacts with a non-myopic long-lived agent, without knowing the agent's payoff function. Although learning in Stackelberg games is well-understood when the agent is myopic, dealing with non-myopic agents poses additional complications. In particular, non-myopic agents may strategize and select actions that are inferior in the present in order to mislead the principal's learning algorithm and obtain better outcomes in the future. We provide a general framework that reduces learning in presence of non-myopic agents to robust bandit optimization in the presence of myopic agents. Through the design and analysis of minimally reactive bandit algorithms, our reduction trades off the statistical efficiency of the principal's learning algorithm against its effectiveness in inducing near-best-responses. We apply this framework to Stackelberg security games (SSGs), pricing with unknown demand curve, general finite Stackelberg games, and strategic classification. In each setting, we characterize the type and impact of misspecifications present in near-best responses and develop a learning algorithm robust to such misspecifications. On the way, we improve the state-of-the-art query complexity of learning in SSGs with $n$ targets from $O(n^3)$ to a near-optimal $\widetilde{O}(n)$ by uncovering a fundamental structural property of these games. The latter result is of independent interest beyond learning with non-myopic agents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题