基于模型的图形增强学习用于电感交通信号控制

论文标题

基于模型的图形增强学习用于电感交通信号控制

Model-based graph reinforcement learning for inductive traffic signal control

论文作者

Devailly, François-Xavier, Larocque, Denis, Charlin, Laurent

论文摘要

大多数用于自适应 - 交通信号控制的增强学习方法都需要从头开始培训，或者在对道路网络，交通分布或训练过程中经历的行为约束进行任何修改之后，都需要应用于任何新的交叉点上。考虑1）训练这种方法所需的大量经验，以及2）必须通过与真实的道路网络用户进行探索方式来收集经验，因此缺乏可转移性限制的实验和适用性。最近的方法使学习政策能够概括为看不见的道路网络拓扑和交通分布，从而部分应对这一挑战。但是，文献仍然在循环的学习（交叉路口的连通性演变必须尊重周期）和无环（较少约束）策略之间进行划分，而这些可转移的方法1）仅与环状约束兼容，而2）不启用协调。我们介绍了一种新的基于模型的方法Mujam，该方法首次启用了显式配位，该方法首次启用了显式协调，还通过允许对控制器的约束进行概括，进一步推动概括。在涉及培训期间从未经历过的道路网络和交通设置的零拍传输设置中，以及在曼哈顿控制3,971个交通信号控制器的更大转移实验中，我们表明，Mujam使用环状和无循环约束，均超过了域名的域域域域域，以及其他可转移的方法。

Most reinforcement learning methods for adaptive-traffic-signal-control require training from scratch to be applied on any new intersection or after any modification to the road network, traffic distribution, or behavioral constraints experienced during training. Considering 1) the massive amount of experience required to train such methods, and 2) that experience must be gathered by interacting in an exploratory fashion with real road-network-users, such a lack of transferability limits experimentation and applicability. Recent approaches enable learning policies that generalize for unseen road-network topologies and traffic distributions, partially tackling this challenge. However, the literature remains divided between the learning of cyclic (the evolution of connectivity at an intersection must respect a cycle) and acyclic (less constrained) policies, and these transferable methods 1) are only compatible with cyclic constraints and 2) do not enable coordination. We introduce a new model-based method, MuJAM, which, on top of enabling explicit coordination at scale for the first time, pushes generalization further by allowing a generalization to the controllers' constraints. In a zero-shot transfer setting involving both road networks and traffic settings never experienced during training, and in a larger transfer experiment involving the control of 3,971 traffic signal controllers in Manhattan, we show that MuJAM, using both cyclic and acyclic constraints, outperforms domain-specific baselines as well as another transferable approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题