贝叶斯安全学习和控制方格的总和分析和多项式内核

论文标题

贝叶斯安全学习和控制方格的总和分析和多项式内核

Bayesian Safe Learning and Control with Sum-of-Squares Analysis and Polynomial Kernels

论文作者

Devonport, Alex, Yin, He, Arcak, Murat

论文摘要

We propose an iterative method to safely learn the unmodeled dynamics of a nonlinear system using Bayesian Gaussian process (GP) models with polynomial kernel functions. The method maintains safety by ensuring that the system state stays within the region of attraction (ROA) of a stabilizing control policy while collecting data. A quadratic programming based exploration control policy is computed to keep the exploration trajectory inside an inner-approximation of the ROA and to maximize the information gained from the trajectory.先前的GP模型（结合了有关未知动态的先验信息）用于构建初始稳定策略。随着GP模型随数据更新，它用于合成新的策略和更大的ROA，从而增加了安全探索的范围。 The use of polynomial kernels allows us to compute ROA inner-approximations and stabilizing control laws for the model using sum-of-squares programming. We also provide a probabilistic guarantee of safety which ensures that the policy computed using the learned model stabilizes the true dynamics with high confidence.

We propose an iterative method to safely learn the unmodeled dynamics of a nonlinear system using Bayesian Gaussian process (GP) models with polynomial kernel functions. The method maintains safety by ensuring that the system state stays within the region of attraction (ROA) of a stabilizing control policy while collecting data. A quadratic programming based exploration control policy is computed to keep the exploration trajectory inside an inner-approximation of the ROA and to maximize the information gained from the trajectory. A prior GP model, which incorporates prior information about the unknown dynamics, is used to construct an initial stabilizing policy. As the GP model is updated with data, it is used to synthesize a new policy and a larger ROA, which increases the range of safe exploration. The use of polynomial kernels allows us to compute ROA inner-approximations and stabilizing control laws for the model using sum-of-squares programming. We also provide a probabilistic guarantee of safety which ensures that the policy computed using the learned model stabilizes the true dynamics with high confidence.

下载PDF全文

下载文献需遵守相关版权规定

论文标题