通过离线增强学习优化高速公路驾驶的轨迹

论文标题

通过离线增强学习优化高速公路驾驶的轨迹

Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning

论文作者

Mirchevska, Branka, Werling, Moritz, Boedecker, Joschka

论文摘要

实施能够输出可行，平稳和高效轨迹的自动驾驶汽车是一个长期的挑战。已经考虑了几种方法，大致属于两个类别：基于规则和基于学习的方法。基于规则的方法在保证安全性和可行性的同时，在长期计划和概括方面缺乏。基于学习的方法能够说明长期计划和概括以使人看不见的情况，但可能无法达到平稳性，安全性和基于规则的方法的可行性。因此，结合两种方法是朝着两者中获得最佳妥协的明显步骤。我们提出了一种基于增强学习的方法，该方法学习了目标轨迹参数，以在高速公路上完全自主驾驶。受过训练的代理输出基于可行多项式轨迹的连续轨迹参数。我们将代理商的性能与其他四个高速公路驾驶代理进行了比较。考虑到各种现实，动态变化的高速公路场景，包括具有不同驾驶员行为的周围车辆，实验是在相扑模拟器中进行的。我们证明，通过随机收集的数据，我们的离线训练的代理学会了平稳驱动，并尽可能接近所需的速度，同时胜过其他代理。代码，培训数据和详细信息，网址为：https：//nrgit.informatik.uni-freiburg。 de/branka.mirchevska/offline-rl-tp。

Implementing an autonomous vehicle that is able to output feasible, smooth and efficient trajectories is a long-standing challenge. Several approaches have been considered, roughly falling under two categories: rule-based and learning-based approaches. The rule-based approaches, while guaranteeing safety and feasibility, fall short when it comes to long-term planning and generalization. The learning-based approaches are able to account for long-term planning and generalization to unseen situations, but may fail to achieve smoothness, safety and the feasibility which rule-based approaches ensure. Hence, combining the two approaches is an evident step towards yielding the best compromise out of both. We propose a Reinforcement Learning-based approach, which learns target trajectory parameters for fully autonomous driving on highways. The trained agent outputs continuous trajectory parameters based on which a feasible polynomial-based trajectory is generated and executed. We compare the performance of our agent against four other highway driving agents. The experiments are conducted in the Sumo simulator, taking into consideration various realistic, dynamically changing highway scenarios, including surrounding vehicles with different driver behaviors. We demonstrate that our offline trained agent, with randomly collected data, learns to drive smoothly, achieving velocities as close as possible to the desired velocity, while outperforming the other agents. Code, training data and details available at: https://nrgit.informatik.uni-freiburg. de/branka.mirchevska/offline-rl-tp.

下载PDF全文

下载文献需遵守相关版权规定

论文标题