共同的可区分优化和认证的强化学习验证

论文标题

共同的可区分优化和认证的强化学习验证

Joint Differentiable Optimization and Verification for Certified Reinforcement Learning

论文作者

Wang, Yixuan, Zhan, Simon, Wang, Zhilu, Huang, Chao, Wang, Zhaoran, Yang, Zhuoran, Zhu, Qi

论文摘要

在基于模型的安全 - 关键控制系统的强化学习中，重要的是要在学习控制器下正式认证系统属性（例如安全性，稳定性）。但是，由于现有方法通常应用正式验证\ emph {after}已学习了控制器，因此即使在学习和验证之间进行了许多迭代，有时也很难获得任何证书。为了应对这一挑战，我们提出了一个框架，该框架通过制定和解决一个新型的双层优化问题来共同进行强化学习和形式验证，该梯度与价值功能和证书可以区分。在各种示例上进行的实验证明了我们框架与基于模型的随机价值梯度（SVG）方法的显着优势以及无模型近端策略优化（PPO）方法在寻找具有屏障功能的可行控制器和Lyapunov功能以确保系统安全性和稳定性。

In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties (e.g., safety, stability) under the learned controller. However, as existing methods typically apply formal verification \emph{after} the controller has been learned, it is sometimes difficult to obtain any certificate, even after many iterations between learning and verification. To address this challenge, we propose a framework that jointly conducts reinforcement learning and formal verification by formulating and solving a novel bilevel optimization problem, which is differentiable by the gradients from the value function and certificates. Experiments on a variety of examples demonstrate the significant advantages of our framework over the model-based stochastic value gradient (SVG) method and the model-free proximal policy optimization (PPO) method in finding feasible controllers with barrier functions and Lyapunov functions that ensure system safety and stability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题