将深度强化学习与基于模型的路径计划者进行自动驾驶

论文标题

将深度强化学习与基于模型的路径计划者进行自动驾驶

Integrating Deep Reinforcement Learning with Model-based Path Planners for Automated Driving

论文作者

Yurtsever, Ekim, Capito, Linda, Redmill, Keith, Ozguner, Umit

论文摘要

城市环境中的自动驾驶具有挑战性。人类参与者的行为很难建模，常规的，基于规则的自动驾驶系统（ADS）在面对未建模动态时往往会失败。另一方面，基于无模型的ADS的最新，端到端的深度加固学习（DRL）显示出令人鼓舞的结果。但是，基于纯学习的方法缺乏基于模型控制器的硬编码的安全措施。在这里，我们提出了一种混合方法，将路径计划管道集成到基于视觉的DRL框架中，以减轻两者的缺点。总之，对DRL代理进行了训练，以尽可能沿着路径规划师的航路点遵循。代理商通过与环境进行互动来了解这一政策。奖励功能包含两个主要术语：偏离路径计划者的惩罚和碰撞的惩罚。后者具有具有明显更高数值的形式。实验结果表明，所提出的方法可以计划其路径，并在Carla（动态的城市模拟环境）中随机选择的原点用途点进行导航。我们的代码是开源的，可以在线提供。

Automated driving in urban settings is challenging. Human participant behavior is difficult to model, and conventional, rule-based Automated Driving Systems (ADSs) tend to fail when they face unmodeled dynamics. On the other hand, the more recent, end-to-end Deep Reinforcement Learning (DRL) based model-free ADSs have shown promising results. However, pure learning-based approaches lack the hard-coded safety measures of model-based controllers. Here we propose a hybrid approach for integrating a path planning pipe into a vision based DRL framework to alleviate the shortcomings of both worlds. In summary, the DRL agent is trained to follow the path planner's waypoints as close as possible. The agent learns this policy by interacting with the environment. The reward function contains two major terms: the penalty of straying away from the path planner and the penalty of having a collision. The latter has precedence in the form of having a significantly greater numerical value. Experimental results show that the proposed method can plan its path and navigate between randomly chosen origin-destination points in CARLA, a dynamic urban simulation environment. Our code is open-source and available online.

下载PDF全文

下载文献需遵守相关版权规定

论文标题