论文标题
发烧篮球:一种复杂,灵活和异步的运动游戏环境,用于多机构增强学习
Fever Basketball: A Complex, Flexible, and Asynchronized Sports Game Environment for Multi-agent Reinforcement Learning
论文作者
论文摘要
深度强化学习(DRL)的发展受益于各种游戏环境的紧急情况,其中提出了新的具有挑战性的问题,并且可以安全,快速测试新算法,例如棋盘游戏,RTS,FPS和MOBA游戏。但是,许多现有的环境缺乏复杂性和灵活性,并假设这些动作是在多代理设置中同步执行的,这些设置变得不那么有价值。我们介绍了发烧篮球比赛,这是一种新颖的增强学习环境,在该环境中,代理商经过训练可以打篮球比赛。这是一个复杂而具有挑战性的环境,它支持多个字符,多个位置以及单一代理和多代理播放器控制模式。此外,为了更好地模拟现实世界中的篮球比赛,动作的执行时间在玩家之间有所不同,这使发烧篮球成为一种新型异步环境。我们在三种游戏场景中评估了独立学习者和联合学习者的常用多代理算法,并且具有不同的困难,并且启发式提出了两种基线方法,以降低发烧篮球基准中异步的额外非平稳性。此外,我们提出了一个集成的课程培训(ICT)框架,以更好地处理发烧篮球问题,其中包括几个基于游戏规则的级联课程学习者和协调课程切换器,专注于增强团队内的协调。结果表明,该游戏仍然具有挑战性,可以用作多个设置中的长期视野,稀疏奖励,信用分配和非平稳性等研究的基准环境。
The development of deep reinforcement learning (DRL) has benefited from the emergency of a variety type of game environments where new challenging problems are proposed and new algorithms can be tested safely and quickly, such as Board games, RTS, FPS, and MOBA games. However, many existing environments lack complexity and flexibility and assume the actions are synchronously executed in multi-agent settings, which become less valuable. We introduce the Fever Basketball game, a novel reinforcement learning environment where agents are trained to play basketball game. It is a complex and challenging environment that supports multiple characters, multiple positions, and both the single-agent and multi-agent player control modes. In addition, to better simulate real-world basketball games, the execution time of actions differs among players, which makes Fever Basketball a novel asynchronized environment. We evaluate commonly used multi-agent algorithms of both independent learners and joint-action learners in three game scenarios with varying difficulties, and heuristically propose two baseline methods to diminish the extra non-stationarity brought by asynchronism in Fever Basketball Benchmarks. Besides, we propose an integrated curricula training (ICT) framework to better handle Fever Basketball problems, which includes several game-rule based cascading curricula learners and a coordination curricula switcher focusing on enhancing coordination within the team. The results show that the game remains challenging and can be used as a benchmark environment for studies like long-time horizon, sparse rewards, credit assignment, and non-stationarity, etc. in multi-agent settings.