基于实时迭代NMPC的强化学习

论文标题

基于实时迭代NMPC的强化学习

Reinforcement Learning Based on Real-Time Iteration NMPC

论文作者

Zanon, Mario, Kungurtsev, Vyacheslav, Gros, Sébastien

论文摘要

强化学习（RL）已证明，在没有任何事先了解过程的情况下，可以从数据中学习最佳策略的惊人能力。 RL的主要缺点是通常很难确保稳定性和安全性。另一方面，非线性模型预测控制（NMPC）是一种基于高级模型的控制技术，可确保安全性和稳定性，但仅对标称模型产生最佳性。因此，最近已提出将NMPC用作RL中的函数近似器。尽管已经证明了这种方法产生良好性能的能力，但阻碍其适用性的主要缺点与NMPC的计算负担有关，NMPC必须求解以完全收敛。但是，实际上，部署了计算有效算法（例如实时迭代（RTI）方案），以便在很短的时间内返回近似NMPC解决方案。在本文中，我们通过将现有的理论框架扩展到基于RTI NMPC的RL来弥合这一差距。我们通过一个非平凡的示例来证明这种新的RL方法的有效性，该示例对充满随机扰动的具有挑战性的非线性系统进行建模，目的是优化经济成本。

Reinforcement Learning (RL) has proven a stunning ability to learn optimal policies from data without any prior knowledge on the process. The main drawback of RL is that it is typically very difficult to guarantee stability and safety. On the other hand, Nonlinear Model Predictive Control (NMPC) is an advanced model-based control technique which does guarantee safety and stability, but only yields optimality for the nominal model. Therefore, it has been recently proposed to use NMPC as a function approximator within RL. While the ability of this approach to yield good performance has been demonstrated, the main drawback hindering its applicability is related to the computational burden of NMPC, which has to be solved to full convergence. In practice, however, computationally efficient algorithms such as the Real-Time Iteration (RTI) scheme are deployed in order to return an approximate NMPC solution in very short time. In this paper we bridge this gap by extending the existing theoretical framework to also cover RL based on RTI NMPC. We demonstrate the effectiveness of this new RL approach with a nontrivial example modeling a challenging nonlinear system subject to stochastic perturbations with the objective of optimizing an economic cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题