混合政策梯度及其实验验证对高级自动化车辆的综合决策和控制

论文标题

混合政策梯度及其实验验证对高级自动化车辆的综合决策和控制

Integrated Decision and Control for High-Level Automated Vehicles by Mixed Policy Gradient and Its Experiment Verification

论文作者

Guan, Yang, Tang, Liye, Li, Chuanxiao, Li, Shengbo Eben, Ren, Yangang, Wei, Junqing, Zhang, Bo, Li, Keqiang

论文摘要

自主驾驶是必不可少的。本文提出了一个基于综合决策和控制（IDC）的自我发展的决策系统，这是一个建立在加强学习（RL）的高级框架（RL）。首先，提出了一种称为“约束混合策略梯度（CMPG）”的RL算法，以始终如一地升级IDC的驾驶策略。它根据惩罚方法调整了MPG，以便使用数据和模型同时解决受约束的优化问题。其次，基于注意力的编码（ABE）方法旨在解决状态表示问题。它引入了一个用于特征提取的嵌入式网络和一个用于功能融合的加权网络，实现了对订单不敏感的编码和重要性区分道路用户。最后，通过融合CMPG和ABE，我们在IDC体系结构下开发了第一个数据驱动的决策和控制系统，并将系统部署在日常操作中运行的彻底功能的自动驾驶车辆上。实验结果表明，通过数据提升，系统可以比基于模型的方法获得更好的驾驶能力。它还在信号交叉路口与真实混合交通流量的各种复杂场景中表现出安全，高效和智能的驾驶行为。

Self-evolution is indispensable to realize full autonomous driving. This paper presents a self-evolving decision-making system based on the Integrated Decision and Control (IDC), an advanced framework built on reinforcement learning (RL). First, an RL algorithm called constrained mixed policy gradient (CMPG) is proposed to consistently upgrade the driving policy of the IDC. It adapts the MPG under the penalty method so that it can solve constrained optimization problems using both the data and model. Second, an attention-based encoding (ABE) method is designed to tackle the state representation issue. It introduces an embedding network for feature extraction and a weighting network for feature fusion, fulfilling order-insensitive encoding and importance distinguishing of road users. Finally, by fusing CMPG and ABE, we develop the first data-driven decision and control system under the IDC architecture, and deploy the system on a fully-functional self-driving vehicle running in daily operation. Experiment results show that boosting by data, the system can achieve better driving ability over model-based methods. It also demonstrates safe, efficient and smart driving behavior in various complex scenes at a signalized intersection with real mixed traffic flow.

下载PDF全文

下载文献需遵守相关版权规定

论文标题