具有远距离保证的随机系统的学习控制政策

论文标题

具有远距离保证的随机系统的学习控制政策

Learning Control Policies for Stochastic Systems with Reach-avoid Guarantees

论文作者

Žikelić, Đorđe, Lechner, Mathias, Henzinger, Thomas A., Chatterjee, Krishnendu

论文摘要

我们研究了具有正式避免到达的非线性随机动力学系统的学习控制器的问题。这项工作提出了第一种提供正式触及避免保证的方法，该方法结合了稳定性和安全保证，并在无限时间范围内[0,1] $中的概率阈值$ p \。我们的方法利用了机器学习文献的进步，它代表正式证书作为神经网络。特别是，我们以Aver-Avoid Supermartingale（RASM）的形式学习证书，这是我们在这项工作中介绍的新颖概念。我们的RASM通过对确定性系统的Lyapunov函数级别集合的随机扩展来施加限制，从而提供了可及性和回避保证。我们的方法解决了几个重要问题 - 可以用来从头开始学习控制策略，以验证固定控制策略的避免范围的规范，或者如果不满足触及范围的规范，则可以微调预训练的策略。我们以$ 3 $随机的非线性增强学习任务来验证我们的方法。

We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold $p\in[0,1]$ over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on $3$ stochastic non-linear reinforcement learning tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题