部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Hierarchical Control for Cooperative Teams in Competitive Autonomous Racing

论文作者

Thakkar, Rishabh Saumil, Samyal, Aryaman Singh, Fridovich-Keil, David, Xu, Zhe, Topcu, Ufuk

论文摘要

我们研究了遵守现实的赛车规则的合作代理团队之间的自主赛车问题。我们的工作通过考虑该问题的广义版本，同时保持两级分层控制结构，从而扩展了对头对头自动赛赛车中层次控制的先前研究。高级战术规划师构建了一个离散的游戏，该游戏使用简化的动力学来编码复杂规则，以产生一系列目标航路点。低级路径策划者将这些航点用作参考轨迹，并通过求解具有简化的赛车游戏的简化公式来计算高分辨率控制输入，并简化了现实的赛车规则。我们探讨了低级路径计划者的两种方法：培训多代理增强学习（MARL）策略，并解决线性季度NASH游戏（LQNG）近似。我们对三个基线的简单和复杂轨道评估控制器：端到端MARL控制器，跟踪固定赛车线的MARL控制器以及跟踪固定赛车线的LQNG控制器。定量结果表明，我们的分层方法在比赛，整体团队绩效和遵守规则方面优于基准。从定性上讲，我们观察到层次控制器模仿了专家人类驾驶员所执行的行动，例如协调超车，捍卫多个对手以及长期计划的延迟优势。

We investigate the problem of autonomous racing among teams of cooperative agents that are subject to realistic racing rules. Our work extends previous research on hierarchical control in head-to-head autonomous racing by considering a generalized version of the problem while maintaining the two-level hierarchical control structure. A high-level tactical planner constructs a discrete game that encodes the complex rules using simplified dynamics to produce a sequence of target waypoints. The low-level path planner uses these waypoints as a reference trajectory and computes high-resolution control inputs by solving a simplified formulation of a racing game with a simplified representation of the realistic racing rules. We explore two approaches for the low-level path planner: training a multi-agent reinforcement learning (MARL) policy and solving a linear-quadratic Nash game (LQNG) approximation. We evaluate our controllers on simple and complex tracks against three baselines: an end-to-end MARL controller, a MARL controller tracking a fixed racing line, and an LQNG controller tracking a fixed racing line. Quantitative results show our hierarchical methods outperform the baselines in terms of race wins, overall team performance, and compliance with the rules. Qualitatively, we observe the hierarchical controllers mimic actions performed by expert human drivers such as coordinated overtaking, defending against multiple opponents, and long-term planning for delayed advantages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题