可解释且安全的加强学习，以实现自动空气流动性

论文标题

可解释且安全的加强学习，以实现自动空气流动性

Explainable and Safe Reinforcement Learning for Autonomous Air Mobility

论文作者

Wang, Lei, Yang, Hongyu, Lin, Yi, Yin, Suwan, Wu, Yuankai

论文摘要

不断提高的交通需求，更高的自动化和沟通增强功能为未来的空中交通管制员（ATC）提供了新的设计机会。本文介绍了一种新颖的深入学习（DRL）控制器，以帮助解决自动驾驶飞行的冲突。尽管DRL在该领域取得了重要的进步，但现有作品几乎不关注与DRL控制器有关的解释性和安全问题，尤其是在对抗性攻击下的安全性。为了解决这两个问题，我们设计了一个完全可以解释的DRL框架，其中我们：1）将耦合的Q值学习模型分解为安全意识和效率（达到目标）； 2）使用周围入侵者的信息作为输入，消除了中央控制器的需求。在我们的模拟实验中，我们表明，通过解耦安全意识和效率，我们可以超越自由飞行控制任务的性能，同时显着提高实用性的可解释性。此外，安全Q学习模块还提供了有关环境安全状况的丰富信息。为了研究对抗性攻击的安全性，我们还提出了一种对抗性攻击策略，可以施加面向安全性和面向效率的攻击。对抗性旨在通过仅在几个时间步骤攻击代理商来最大程度地降低安全/效率。在实验中，我们的攻击策略增加了统一攻击（即在每个时间步骤的攻击）的碰撞，仅降低了四次攻击代理，这提供了对未来ATC设计中DRL的能力和限制的见解。源代码可在https://github.com/wleiiiiii/gym-atc-attack-project上公开获得。

Increasing traffic demands, higher levels of automation, and communication enhancements provide novel design opportunities for future air traffic controllers (ATCs). This article presents a novel deep reinforcement learning (DRL) controller to aid conflict resolution for autonomous free flight. Although DRL has achieved important advancements in this field, the existing works pay little attention to the explainability and safety issues related to DRL controllers, particularly the safety under adversarial attacks. To address those two issues, we design a fully explainable DRL framework wherein we: 1) decompose the coupled Q value learning model into a safety-awareness and efficiency (reach the target) one; and 2) use information from surrounding intruders as inputs, eliminating the needs of central controllers. In our simulated experiments, we show that by decoupling the safety-awareness and efficiency, we can exceed performance on free flight control tasks while dramatically improving explainability on practical. In addition, the safety Q learning module provides rich information about the safety situation of environments. To study the safety under adversarial attacks, we additionally propose an adversarial attack strategy that can impose both safety-oriented and efficiency-oriented attacks. The adversarial aims to minimize safety/efficiency by only attacking the agent at a few time steps. In the experiments, our attack strategy increases as many collisions as the uniform attack (i.e., attacking at every time step) by only attacking the agent four times less often, which provide insights into the capabilities and restrictions of the DRL in future ATC designs. The source code is publicly available at https://github.com/WLeiiiii/Gym-ATC-Attack-Project.

下载PDF全文

下载文献需遵守相关版权规定

论文标题