通过RL辅助激励计划抵消不平等的竞争

论文标题

通过RL辅助激励计划抵消不平等的竞争

Offsetting Unequal Competition through RL-assisted Incentive Schemes

论文作者

Koley, Paramita, Maiti, Aurghya, Bhattacharya, Sourangshu, Ganguly, Niloy

论文摘要

本文调查了具有不平等专业知识的组织之间竞争的动态。多代理强化学习已用于模拟和理解旨在抵消这种不平等的各种激励计划的影响。我们设计了Touch-Mark，这是一款基于众所周知的多代理粒子环境的游戏，其中两个团队（弱，强大）具有不平等但不断变化的技能水平相互竞争。为了训练这样的游戏，我们提出了一个新颖的控制器协助多方强化学习算法\我们的\，该算法使每个代理商都具有一组政策以及有监督的控制器，通过对样本空间进行选择性地分配，触发队友之间的智能角色。使用C-MADDPG作为基础框架，我们为弱团队提出了一个激励计划，以使两支球队的最终回报变得相同。我们发现，尽管有激励措施，但软弱团队的最终奖励却没有强大的球队。在检查时，我们意识到，针对弱团队的总体激励计划不会激励该团队中的较弱的代理人学习和改进。为了抵消这一点，我们现在特别激励较弱的球员学习，结果，观察到，超越初始阶段的弱团队与更强大的团队表现出色。本文的最终目标是制定动态激励计划，该计划不断平衡两支球队的奖励。这是通过设计一种富含RL代理的激励方案来实现的，该方案从环境中获取最小信息。

This paper investigates the dynamics of competition among organizations with unequal expertise. Multi-agent reinforcement learning has been used to simulate and understand the impact of various incentive schemes designed to offset such inequality. We design Touch-Mark, a game based on well-known multi-agent-particle-environment, where two teams (weak, strong) with unequal but changing skill levels compete against each other. For training such a game, we propose a novel controller assisted multi-agent reinforcement learning algorithm \our\, which empowers each agent with an ensemble of policies along with a supervised controller that by selectively partitioning the sample space, triggers intelligent role division among the teammates. Using C-MADDPG as an underlying framework, we propose an incentive scheme for the weak team such that the final rewards of both teams become the same. We find that in spite of the incentive, the final reward of the weak team falls short of the strong team. On inspecting, we realize that an overall incentive scheme for the weak team does not incentivize the weaker agents within that team to learn and improve. To offset this, we now specially incentivize the weaker player to learn and as a result, observe that the weak team beyond an initial phase performs at par with the stronger team. The final goal of the paper has been to formulate a dynamic incentive scheme that continuously balances the reward of the two teams. This is achieved by devising an incentive scheme enriched with an RL agent which takes minimum information from the environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题