论文标题
用于强化大规模多代理系统的强化学习集中网络
Concentration Network for Reinforcement Learning of Large-Scale Multi-Agent Systems
论文作者
论文摘要
在处理一系列迫在眉睫的问题时,人类可以自然地集中于这些问题的一部分,从而根据他们对动机指数的贡献进行优先排序,例如赢得游戏的可能性。这种集中精力的想法提供了有关数百名代理商参与的复杂大规模多机构系统(LMA)的强化学习的见解。在这样的LMA中,每个代理在每个步骤中都会收到一系列实体观察结果,这会使现有的聚合网络(例如图形注意网络)淹没并导致效率低下。在本文中,我们提出了一个称为concnet的浓度网络。首先,Concnet评分了观察到的实体,这些实体考虑了几个动机指数,例如预期的生存时间和状态,然后将观察到的实体的编码进行排名,修剪和汇总以提取特征。其次,与众所周知的注意机制不同,Concnet具有独特的动机子网,可以在评分观察到的实体时明确考虑动机指数。此外,我们提出了一个集中政策梯度架构,可以从头开始学习LMA的有效政策。广泛的实验表明,提出的体系结构具有出色的可扩展性和灵活性,并且在LMAS基准测试中的现有方法明显优于现有方法。
When dealing with a series of imminent issues, humans can naturally concentrate on a subset of these concerning issues by prioritizing them according to their contributions to motivational indices, e.g., the probability of winning a game. This idea of concentration offers insights into reinforcement learning of sophisticated Large-scale Multi-Agent Systems (LMAS) participated by hundreds of agents. In such an LMAS, each agent receives a long series of entity observations at each step, which can overwhelm existing aggregation networks such as graph attention networks and cause inefficiency. In this paper, we propose a concentration network called ConcNet. First, ConcNet scores the observed entities considering several motivational indices, e.g., expected survival time and state value of the agents, and then ranks, prunes, and aggregates the encodings of observed entities to extract features. Second, distinct from the well-known attention mechanism, ConcNet has a unique motivational subnetwork to explicitly consider the motivational indices when scoring the observed entities. Furthermore, we present a concentration policy gradient architecture that can learn effective policies in LMAS from scratch. Extensive experiments demonstrate that the presented architecture has excellent scalability and flexibility, and significantly outperforms existing methods on LMAS benchmarks.