学习使用稀疏的模仿加强学习来驾驶

论文标题

学习使用稀疏的模仿加强学习来驾驶

Learning to Drive Using Sparse Imitation Reinforcement Learning

论文作者

Han, Yuci, Yilmaz, Alper

论文摘要

在本文中，我们提出了稀疏的模仿加强学习（SIRL），这是一种混合端到端控制政策，将稀疏的专家驾驶知识与强化学习（RL）自主驾驶（AD）任务（在Carla模拟环境中）结合在一起。稀疏的专家是基于次优的手工制作的规则而设计的，但通过为关键场景（例如行人和车辆避免和交通灯检测）实施经验，提供了一种规避风险的策略。正如已经证明的那样，由于国家空间的巨大范围，从头开始培训RL代理人是数据的且耗时的，尤其是对于城市驾驶任务而言，尤其是对于城市驾驶任务而言。我们的SIRL策略通过融合稀疏专家政策的输出分布和RL策略来生成复合驾驶策略，提供了解决这些问题的解决方案。在早期培训阶段稀疏专家的指导下，SIRH策略加速了训练过程，并防止RL探索导致灾难结果，并确保安全探索。在某种程度上，Sirl Agent模仿了驾驶专家的行为。同时，它在培训期间不断获得知识，因此它不断提高稀疏专家，并且可以超越稀疏的专家和传统的RL代理。我们在Carla模拟器内的复杂城市场景中实验验证了所提出的SIRR方法的功效。此外，我们将Sirl Agent的表现与传统的RL方法进行了比较，以规避风险的探索和高学习效率。我们还展示了Sirl Agent的概括能力，可以将驾驶技能转移到看不见的环境中。

In this paper, we propose Sparse Imitation Reinforcement Learning (SIRL), a hybrid end-to-end control policy that combines the sparse expert driving knowledge with reinforcement learning (RL) policy for autonomous driving (AD) task in CARLA simulation environment. The sparse expert is designed based on hand-crafted rules which is suboptimal but provides a risk-averse strategy by enforcing experience for critical scenarios such as pedestrian and vehicle avoidance, and traffic light detection. As it has been demonstrated, training a RL agent from scratch is data-inefficient and time consuming particularly for the urban driving task, due to the complexity of situations stemming from the vast size of state space. Our SIRL strategy provides a solution to solve these problems by fusing the output distribution of the sparse expert policy and the RL policy to generate a composite driving policy. With the guidance of the sparse expert during the early training stage, SIRL strategy accelerates the training process and keeps the RL exploration from causing a catastrophe outcome, and ensures safe exploration. To some extent, the SIRL agent is imitating the driving expert's behavior. At the same time, it continuously gains knowledge during training therefore it keeps making improvement beyond the sparse expert, and can surpass both the sparse expert and a traditional RL agent. We experimentally validate the efficacy of proposed SIRL approach in a complex urban scenario within the CARLA simulator. Besides, we compare the SIRL agent's performance for risk-averse exploration and high learning efficiency with the traditional RL approach. We additionally demonstrate the SIRL agent's generalization ability to transfer the driving skill to unseen environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题