Oracles＆Plusters：深度多代理增强学习中的Stackelberg平衡

论文标题

Oracles＆Plusters：深度多代理增强学习中的Stackelberg平衡

Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning

论文作者

Gerstgrasser, Matthias, Parkes, David C.

论文摘要

Stackelberg平衡自然出现在一系列流行的学习问题中，例如在安全游戏或间接机制设计中，并且在强化学习文献中受到了越来越多的关注。我们提出了一个通用框架，用于将Stackelberg Equilibria搜索作为多代理RL问题，从而允许各种算法设计选择。我们讨论如何将以前的方法视为该框架的特定实例。作为关键见解，我们注意到，设计空间允许以前在文献中看不见的方法，例如利用多任务和元RL技术来进行追随者收敛。我们使用上下文策略提出了一种这样的方法，并在标准和新型基准域进行实验评估，与以前的方法相比，样品效率大大提高。最后，我们探讨了在框架边界之外采用算法设计的效果。

Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. Finally, we explore the effect of adopting algorithm designs outside the borders of our framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题