马尔可夫决策过程的最佳标准和模型预测控制的等效性

论文标题

马尔可夫决策过程的最佳标准和模型预测控制的等效性

Equivalence of Optimality Criteria for Markov Decision Process and Model Predictive Control

论文作者

Kordabad, Arash Bahari, Zanon, Mario, Gros, Sebastien

论文摘要

本文表明，即使是基于不稳定的模型，也可以通过有限的摩托车未验证的最佳控制问题（OCP）来捕获马尔可夫决策过程（MDP）的最佳策略和价值功能（MDP）。这可以通过为OCP选择适当的阶段成本和终端成本来实现。 OCP的一个非常有用的特定情况是模型预测控制（MPC）方案，其中使用确定性（可能是非线性）模型来降低计算复杂性。该观察结果使我们能够完全参数化MPC方案，包括成本函数。实际上，可以使用增强学习算法来调整参数化的MPC方案。我们在LQR案例中分析了开发的定理，并研究了模拟中的其他一些非线性示例。

This paper shows that the optimal policy and value functions of a Markov Decision Process (MDP), either discounted or not, can be captured by a finite-horizon undiscounted Optimal Control Problem (OCP), even if based on an inexact model. This can be achieved by selecting a proper stage cost and terminal cost for the OCP. A very useful particular case of OCP is a Model Predictive Control (MPC) scheme where a deterministic (possibly nonlinear) model is used to reduce the computational complexity. This observation leads us to parameterize an MPC scheme fully, including the cost function. In practice, Reinforcement Learning algorithms can then be used to tune the parameterized MPC scheme. We verify the developed theorems analytically in an LQR case and we investigate some other nonlinear examples in simulations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题