联合V2I网络选择和自动驾驶政策的强化学习

论文标题

联合V2I网络选择和自动驾驶政策的强化学习

Reinforcement Learning for Joint V2I Network Selection and Autonomous Driving Policies

论文作者

Yan, Zijiang, Tabassum, Hina

论文摘要

车辆到基础设施（V2I）通信对于增强自动驾驶汽车（AV）的可靠性至关重要。但是，道路交通和AVS无线连接的不确定性会严重损害及时的决策。因此，至关重要的是，同时优化AVS网络选择和驱动政策，以最大程度地减少道路碰撞，同时最大化通信数据速率。在本文中，我们开发了增强学习（RL）框架，以表征有效的网络选择和自主驾驶策略在传统的Sub-6GHz频谱和Terahertz（THZ）频率上运行的多波段车辆网络（VNET）中。所提出的框架旨在（i）通过自动驾驶的角度控制车辆的运动动力学（即速度和加速度）来最大化交通流量，并最大程度地减少碰撞，以及（ii）通过从Telecommunication Prowspective中共同控制车辆的动态动力学和网络选择，从而最大程度地减少数据速率并最大程度地减少交接。我们将这个问题作为马尔可夫决策过程（MDP）提出，并开发了基于Q的深度学习解决方案，以优化给定AV状态的加速度，减速，车道变速器和AV基准站分配等动作。 AV的状态是根据AV的速度和通信渠道状态定义的。数值结果表明了与车辆运动动力学，交接和通信数据速率相互依赖性有关的有趣见解。拟议的政策使AVS能够采用具有改善连接性的安全驾驶行为。

Vehicle-to-Infrastructure (V2I) communication is becoming critical for the enhanced reliability of autonomous vehicles (AVs). However, the uncertainties in the road-traffic and AVs' wireless connections can severely impair timely decision-making. It is thus critical to simultaneously optimize the AVs' network selection and driving policies in order to minimize road collisions while maximizing the communication data rates. In this paper, we develop a reinforcement learning (RL) framework to characterize efficient network selection and autonomous driving policies in a multi-band vehicular network (VNet) operating on conventional sub-6GHz spectrum and Terahertz (THz) frequencies. The proposed framework is designed to (i) maximize the traffic flow and minimize collisions by controlling the vehicle's motion dynamics (i.e., speed and acceleration) from autonomous driving perspective, and (ii) maximize the data rates and minimize handoffs by jointly controlling the vehicle's motion dynamics and network selection from telecommunication perspective. We cast this problem as a Markov Decision Process (MDP) and develop a deep Q-learning based solution to optimize the actions such as acceleration, deceleration, lane-changes, and AV-base station assignments for a given AV's state. The AV's state is defined based on the velocities and communication channel states of AVs. Numerical results demonstrate interesting insights related to the inter-dependency of vehicle's motion dynamics, handoffs, and the communication data rate. The proposed policies enable AVs to adopt safe driving behaviors with improved connectivity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题