TC-Driver：轨迹条件驾驶，用于强大的自主赛车 - 一种增强学习方法

论文标题

TC-Driver：轨迹条件驾驶，用于强大的自主赛车 - 一种增强学习方法

TC-Driver: Trajectory Conditioned Driving for Robust Autonomous Racing -- A Reinforcement Learning Approach

论文作者

Ghignone, Edoardo, Baumann, Nicolas, Boss, Mike, Magno, Michele

论文摘要

自主赛车在学术和行业研究人员中变得越来越流行，以测试通过将感知，计划和控制算法推向其极限的一般自主驾驶。尽管传统的控制方法（例如MPC）能够在车辆物理可控性的边缘生成最佳控制序列，但这些方法对建模参数的准确性很敏感。本文介绍了TC-Driver，这是一种在自动赛车中进行稳健控制的RL方法。特别是，TC-Driver剂是由任何任意传统高级计划者产生的轨迹来调节的。提出的TC驱动器通过利用RL的启发式性质来解决轮胎参数建模的不准确性，同时利用层次控制结构中传统计划方法的可靠性。我们在不同的轮胎条件下训练代理，从而使其概括为不同的模型参数，旨在提高系统在实践中的赛车能力。所提出的RL方法在模型不匹配设置中优于基于非学习的MPC，其碰撞比降低了2.7，从而强调了与参数差异的鲁棒性。此外，与平均MPC求解时间为11.5 ms相比，平均RL推理持续时间为0.25毫秒，得出近40倍的速度，可以在计算约束设备中进行复杂的控制部署。最后，我们表明，作为从感觉输入直接学习的控制策略，经常使用的端到端RL体系结构不太适合模型不匹配鲁棒性或跟踪概括。我们的现实模拟表明，在模型不匹配和跟踪概括设置下，TC驱动器达到了6.7且3倍的碰撞比，同时达到的单圈时间比端到端的方法较低，这表明了TC驱动器对可靠自主赛车的可行性。

Autonomous racing is becoming popular for academic and industry researchers as a test for general autonomous driving by pushing perception, planning, and control algorithms to their limits. While traditional control methods such as MPC are capable of generating an optimal control sequence at the edge of the vehicles physical controllability, these methods are sensitive to the accuracy of the modeling parameters. This paper presents TC-Driver, a RL approach for robust control in autonomous racing. In particular, the TC-Driver agent is conditioned by a trajectory generated by any arbitrary traditional high-level planner. The proposed TC-Driver addresses the tire parameter modeling inaccuracies by exploiting the heuristic nature of RL while leveraging the reliability of traditional planning methods in a hierarchical control structure. We train the agent under varying tire conditions, allowing it to generalize to different model parameters, aiming to increase the racing capabilities of the system in practice. The proposed RL method outperforms a non-learning-based MPC with a 2.7 lower crash ratio in a model mismatch setting, underlining robustness to parameter discrepancies. In addition, the average RL inference duration is 0.25 ms compared to the average MPC solving time of 11.5 ms, yielding a nearly 40-fold speedup, allowing for complex control deployment in computationally constrained devices. Lastly, we show that the frequently utilized end-to-end RL architecture, as a control policy directly learned from sensory input, is not well suited to model mismatch robustness nor track generalization. Our realistic simulations show that TC-Driver achieves a 6.7 and 3-fold lower crash ratio under model mismatch and track generalization settings, while simultaneously achieving lower lap times than an end-to-end approach, demonstrating the viability of TC-driver to robust autonomous racing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题