论文标题
智能交通灯控制的多机构广泛的强化学习
Multi-Agent Broad Reinforcement Learning for Intelligent Traffic Light Control
论文作者
论文摘要
智能交通灯管制系统(ITLC)是一个典型的多代理系统(MAS),它包括多条道路和交通信号灯。构建MAS模型的ITLCS是减轻交通拥堵的基础。 MAS的现有方法主要基于多代理的深入增强学习(MADRL)。尽管MABRL的深神经网络(DNN)有效,但训练时间很长,并且很难追踪参数。最近,广泛的学习系统(BLS)提供了一种选择性的方法,可以通过平坦的网络在深层神经网络中学习。此外,广泛的增强学习(BRL)在单一代理深层增强学习(SADRL)问题中扩展了BLS,并具有有希望的结果。但是,BRL不关注代理的复杂结构和相互作用。在MADRL的特征和BRL问题的激励下,我们提出了一个多代理的广泛强化学习(MABRL)框架,以探索BLS在MAS中的功能。首先,与大多数使用一系列深神经网络结构的MADRL方法不同,我们用广泛的网络对每个代理进行建模。然后,我们引入了动态的自我循环交互机制,以确认“ 3W”信息:何时进行交互,代理需要考虑哪些信息,要传输哪些信息。最后,我们根据智能交通灯控制场景进行实验。我们将MABRL方法与六种不同的方法进行了比较,并且在三个数据集上的实验结果验证了MABRL的有效性。
Intelligent Traffic Light Control System (ITLCS) is a typical Multi-Agent System (MAS), which comprises multiple roads and traffic lights.Constructing a model of MAS for ITLCS is the basis to alleviate traffic congestion. Existing approaches of MAS are largely based on Multi-Agent Deep Reinforcement Learning (MADRL). Although the Deep Neural Network (DNN) of MABRL is effective, the training time is long, and the parameters are difficult to trace. Recently, Broad Learning Systems (BLS) provided a selective way for learning in the deep neural networks by a flat network. Moreover, Broad Reinforcement Learning (BRL) extends BLS in Single Agent Deep Reinforcement Learning (SADRL) problem with promising results. However, BRL does not focus on the intricate structures and interaction of agents. Motivated by the feature of MADRL and the issue of BRL, we propose a Multi-Agent Broad Reinforcement Learning (MABRL) framework to explore the function of BLS in MAS. Firstly, unlike most existing MADRL approaches, which use a series of deep neural networks structures, we model each agent with broad networks. Then, we introduce a dynamic self-cycling interaction mechanism to confirm the "3W" information: When to interact, Which agents need to consider, What information to transmit. Finally, we do the experiments based on the intelligent traffic light control scenario. We compare the MABRL approach with six different approaches, and experimental results on three datasets verify the effectiveness of MABRL.