使用黑盒可及性分析的安全加固学习

论文标题

使用黑盒可及性分析的安全加固学习

Safe Reinforcement Learning Using Black-Box Reachability Analysis

论文作者

Selim, Mahmoud, Alanwar, Amr, Kousik, Shreyas, Gao, Grace, Pavone, Marco, Johansson, Karl H.

论文摘要

强化学习（RL）能够在不确定的环境中对机器人进行复杂的运动计划和控制。但是，最先进的深度RL方法通常缺乏安全保证，尤其是在机器人和环境模型未知时。为了证明广泛的部署是合理的，机器人必须尊重安全限制而不会牺牲绩效。因此，我们提出了一个基于黑盒可及性的安全层（BRSL），其中有三个主要组成部分：（1）针对黑色盒机器人模型的数据驱动的可及性分析，（2）轨迹推出的计划者，可预测未来的动作和观察，并使用经过在线的神经网络的集合进行观察，以及（3）在线碰撞的动作，（3）纠正了可及可及障碍物的纠正范围和障碍。在模拟中，BRSL在Turtlebot 3，二次手机，轨迹追踪点质量和风中的六臂杆上胜过其他最先进的安全RL方法，与最高奖励区域相邻的六角形。

Reinforcement learning (RL) is capable of sophisticated motion planning and control for robots in uncertain environments. However, state-of-the-art deep RL approaches typically lack safety guarantees, especially when the robot and environment models are unknown. To justify widespread deployment, robots must respect safety constraints without sacrificing performance. Thus, we propose a Black-box Reachability-based Safety Layer (BRSL) with three main components: (1) data-driven reachability analysis for a black-box robot model, (2) a trajectory rollout planner that predicts future actions and observations using an ensemble of neural networks trained online, and (3) a differentiable polytope collision check between the reachable set and obstacles that enables correcting unsafe actions. In simulation, BRSL outperforms other state-of-the-art safe RL methods on a Turtlebot 3, a quadrotor, a trajectory-tracking point mass, and a hexarotor in wind with an unsafe set adjacent to the area of highest reward.

下载PDF全文

下载文献需遵守相关版权规定

论文标题