级别的Q强化学习与信息增强的状态编码器用于层次协作多车辆追踪

论文标题

级别的Q强化学习与信息增强的状态编码器用于层次协作多车辆追踪

Graded-Q Reinforcement Learning with Information-Enhanced State Encoder for Hierarchical Collaborative Multi-Vehicle Pursuit

论文作者

Yang, Yiying, Li, Xinhang, Yuan, Zheng, Wang, Qinwen, Xu, Chen, Zhang, Lin

论文摘要

从各种现实世界情景中提取的问题，多车辆追击（MVP）正在成为智能运输系统（ITS）中的热门研究主题。人工智能（AI）和互联车辆的结合极大地促进了MVP的研究开发。但是，现有关于MVP的作品几乎不关注在复杂的城市交通环境下追求车辆之间信息交换和合作的重要性。本文提出了一个分级Q强化学习，并使用信息增强的状态编码器（GQRL-ESE）框架来解决此层次协作的多车辆追踪（HCMVP）问题。在GQRL-ESE中，提出了一个合作分级的Q计划，以促进追求工具以提高追求效率的决策。每个追求车辆进一步使用深Q网络（DQN）根据其编码状态做出决定。协调的Q优化网络根据当前的环境流量信息调整了各个决策，以获得全局最佳操作集。此外，信息增强的状态编码器旨在从多个角度提取关键信息，并使用注意力机制来帮助每种追求车辆有效地确定目标。基于相扑的广泛实验结果表明，所提出的GQRL-ESE的总时间段平均比其他方法少47.64％，这表明GQRL-ESE的效率很高。代码在https://github.com/ant-ist/gqrl-iese中外包。

The multi-vehicle pursuit (MVP), as a problem abstracted from various real-world scenarios, is becoming a hot research topic in Intelligent Transportation System (ITS). The combination of Artificial Intelligence (AI) and connected vehicles has greatly promoted the research development of MVP. However, existing works on MVP pay little attention to the importance of information exchange and cooperation among pursuing vehicles under the complex urban traffic environment. This paper proposed a graded-Q reinforcement learning with information-enhanced state encoder (GQRL-IESE) framework to address this hierarchical collaborative multi-vehicle pursuit (HCMVP) problem. In the GQRL-IESE, a cooperative graded Q scheme is proposed to facilitate the decision-making of pursuing vehicles to improve pursuing efficiency. Each pursuing vehicle further uses a deep Q network (DQN) to make decisions based on its encoded state. A coordinated Q optimizing network adjusts the individual decisions based on the current environment traffic information to obtain the global optimal action set. In addition, an information-enhanced state encoder is designed to extract critical information from multiple perspectives and uses the attention mechanism to assist each pursuing vehicle in effectively determining the target. Extensive experimental results based on SUMO indicate that the total timestep of the proposed GQRL-IESE is less than other methods on average by 47.64%, which demonstrates the excellent pursuing efficiency of the GQRL-IESE. Codes are outsourced in https://github.com/ANT-ITS/GQRL-IESE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题