论文标题
正式合同缓解多代理RL中的社会困境
Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL
论文作者
论文摘要
多代理增强学习(MARL)是训练在共同环境中独立起作用的自主代理的强大工具。但是,当个人激励措施和群体激励措施分歧时,它可能导致次优行为。人类非常有能力解决这些社会困境。在MAL中,复制自私的代理商中的这种合作行为是一个开放的问题。在这项工作中,我们借鉴了经济学正式签约的想法,以克服MARL代理商之间的动力分歧。我们提出了对马尔可夫游戏的增强,在预先指定的条件下,代理商自愿同意约束奖励的转移。我们的贡献是理论和经验的。首先,我们表明这种增强使所有完全可观察到的马尔可夫游戏的所有子游戏完美平衡表现出社会最佳的行为,鉴于合同的足够丰富的空间。接下来,我们表明,对于一般合同空间,甚至在部分可观察性下,更丰富的合同空间都会带来更高的福利。因此,合同空间设计解决了探索探索的权衡,避开激励问题。我们通过实验补充理论分析。使用受多目标增强学习启发的培训方法来减轻缩合增强中的勘探问题:多目标合同增强学习(MOCA)。我们在静态,单器游戏中测试我们的方法,以及模拟流量,污染管理和共同池资源管理的动态域。
Multi-agent Reinforcement Learning (MARL) is a powerful tool for training autonomous agents acting independently in a common environment. However, it can lead to sub-optimal behavior when individual incentives and group incentives diverge. Humans are remarkably capable at solving these social dilemmas. It is an open problem in MARL to replicate such cooperative behaviors in selfish agents. In this work, we draw upon the idea of formal contracting from economics to overcome diverging incentives between agents in MARL. We propose an augmentation to a Markov game where agents voluntarily agree to binding transfers of reward, under pre-specified conditions. Our contributions are theoretical and empirical. First, we show that this augmentation makes all subgame-perfect equilibria of all Fully Observable Markov Games exhibit socially optimal behavior, given a sufficiently rich space of contracts. Next, we show that for general contract spaces, and even under partial observability, richer contract spaces lead to higher welfare. Hence, contract space design solves an exploration-exploitation tradeoff, sidestepping incentive issues. We complement our theoretical analysis with experiments. Issues of exploration in the contracting augmentation are mitigated using a training methodology inspired by multi-objective reinforcement learning: Multi-Objective Contract Augmentation Learning (MOCA). We test our methodology in static, single-move games, as well as dynamic domains that simulate traffic, pollution management and common pool resource management.