基于利他的驾驶，预测意识和强化学习

论文标题

基于利他的驾驶，预测意识和强化学习

Prediction-aware and Reinforcement Learning based Altruistic Cooperative Driving

论文作者

Valiente, Rodolfo, Razzaghpour, Mahdi, Toghi, Behrad, Shah, Ghayoor, Fallah, Yaser P.

论文摘要

在存在人类驱动的车辆（HVS）的情况下，自动驾驶汽车（AV）导航具有挑战性，因为HV不断响应AVS，不断更新其政策。为了在存在复杂的AV-HV社交互动的情况下安全导航，AVS必须学会预测这些变化。人类由于对其他代理行为的内在知识而有能力在这种挑战性的社会互动环境中导航，并以此来预测将来可能发生的事情。受人类的启发，我们为我们的AVS提供了预期未来国家的能力，并在合作强化学习（RL）决策框架中利用预测，以提高安全性和鲁棒性。在本文中，我们提出了两个基本和早期表现的AVS组成部分的集成：社会导航和预测。我们将AV决策过程作为RL问题制定，并试图通过使用预测意识的计划和社会意识的优化RL框架获得最佳政策，从而获得对社会有益的结果。我们还提出了一个预测未来观察结果的混合预测网络（HPN）。 HPN用于多步预测链中，以计算值函数网络（VFN）使用预测的未来观察结果的窗口。最后，对安全VFN进行了培训，可以使用一系列先前和预测的观察结果来优化社会实用程序，并使用安全优先次序来利用可解释的运动学预测来掩盖不安全的动作，从而限制RL策略。我们将我们的预测意识AV与最先进的解决方案进行了比较，并在多个模拟场景中证明了效率和安全性的性能提高。

Autonomous vehicle (AV) navigation in the presence of Human-driven vehicles (HVs) is challenging, as HVs continuously update their policies in response to AVs. In order to navigate safely in the presence of complex AV-HV social interactions, the AVs must learn to predict these changes. Humans are capable of navigating such challenging social interaction settings because of their intrinsic knowledge about other agents behaviors and use that to forecast what might happen in the future. Inspired by humans, we provide our AVs the capability of anticipating future states and leveraging prediction in a cooperative reinforcement learning (RL) decision-making framework, to improve safety and robustness. In this paper, we propose an integration of two essential and earlier-presented components of AVs: social navigation and prediction. We formulate the AV decision-making process as a RL problem and seek to obtain optimal policies that produce socially beneficial results utilizing a prediction-aware planning and social-aware optimization RL framework. We also propose a Hybrid Predictive Network (HPN) that anticipates future observations. The HPN is used in a multi-step prediction chain to compute a window of predicted future observations to be used by the value function network (VFN). Finally, a safe VFN is trained to optimize a social utility using a sequence of previous and predicted observations, and a safety prioritizer is used to leverage the interpretable kinematic predictions to mask the unsafe actions, constraining the RL policy. We compare our prediction-aware AV to state-of-the-art solutions and demonstrate performance improvements in terms of efficiency and safety in multiple simulated scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题