基于增强学习的鲁棒性和适应性基于混合自治的合作自动驾驶

论文标题

基于增强学习的鲁棒性和适应性基于混合自治的合作自动驾驶

Robustness and Adaptability of Reinforcement Learning based Cooperative Autonomous Driving in Mixed-autonomy Traffic

论文作者

Valiente, Rodolfo, Toghi, Behrad, Pedarsani, Ramtin, Fallah, Yaser P.

论文摘要

建造自动驾驶汽车（AV）是一个复杂的问题，但是使它们能够在现实世界中运作，在现实世界中，它们将被人类驱动的车辆包围（HVS）极具挑战性。先前的工作表明，在遵循社会公用事业的一组AV之间建立合法合作的可能性。这种无私的AV可以形成联盟并影响HV的行为，以实现社会期望的结果。我们确定了AV和HVS共存的两个主要挑战。首先，一个给定的人类驾驶员的社会偏好和个体特征，例如，无私和侵略性是AV所未知的，几乎不可能在短暂的AV-HV互动中实时推断它们。其次，与期望遵循政策的AV相反，HVS不一定遵循固定政策，因此很难预测。为了减轻上述挑战，我们将混合自治问题作为多机构增强学习（MARL）问题，并为培训合作AVS提出了分散的框架和奖励功能。我们的方法使AV可以隐式地从经验中学习HVS的决策，对社会公用事业进行优化，同时优先考虑安全性并允许适应性。将利他的AV稳健地实现了不同的人类行为，并将其限制为安全的行动空间。最后，我们研究了AVS对各种HVS行为特征的鲁棒性，安全性和敏感性，并介绍了AVS可以学习适应不同情况的合作策略的设置。

Building autonomous vehicles (AVs) is a complex problem, but enabling them to operate in the real world where they will be surrounded by human-driven vehicles (HVs) is extremely challenging. Prior works have shown the possibilities of creating inter-agent cooperation between a group of AVs that follow a social utility. Such altruistic AVs can form alliances and affect the behavior of HVs to achieve socially desirable outcomes. We identify two major challenges in the co-existence of AVs and HVs. First, social preferences and individual traits of a given human driver, e.g., selflessness and aggressiveness are unknown to an AV, and it is almost impossible to infer them in real-time during a short AV-HV interaction. Second, contrary to AVs that are expected to follow a policy, HVs do not necessarily follow a stationary policy and therefore are extremely hard to predict. To alleviate the above-mentioned challenges, we formulate the mixed-autonomy problem as a multi-agent reinforcement learning (MARL) problem and propose a decentralized framework and reward function for training cooperative AVs. Our approach enables AVs to learn the decision-making of HVs implicitly from experience, optimizes for a social utility while prioritizing safety and allowing adaptability; robustifying altruistic AVs to different human behaviors and constraining them to a safe action space. Finally, we investigate the robustness, safety and sensitivity of AVs to various HVs behavioral traits and present the settings in which the AVs can learn cooperative policies that are adaptable to different situations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题