论文标题
模型预测性控制通过式模仿学习
Model Predictive Control via On-Policy Imitation Learning
论文作者
论文摘要
在本文中,我们利用模仿学习的快速进步,这是强烈的强化学习(RL)文献中的强烈关注的话题,以开发新样本复杂性结果和数据驱动的模型预测性控制(MPC)对约束线性系统的绩效保证。以最简单的形式,模仿学习是一种试图通过向专家查询样本来学习专家政策的方法。最新的数据驱动MPC方法使用了最简单的模仿学习形式,称为行为克隆来学习控制器,通过在线抽样闭环MPC系统的轨迹模仿MPC的性能。但是,克隆的行为是一种已知数据效率低下并遭受分布变化的方法。作为替代方案,我们开发了远期训练算法的变体,该算法是Ross等人提出的一种现行模仿学习方法。 (2010)。我们的算法使用约束线性MPC的结构,我们的分析使用显式MPC解决方案的属性来理论上绑定实现最佳性能所需的在线MPC轨迹的数量。我们通过模拟来验证我们的结果,并表明当应用于MPC时,前训练算法确实优于行为克隆。
In this paper, we leverage the rapid advances in imitation learning, a topic of intense recent focus in the Reinforcement Learning (RL) literature, to develop new sample complexity results and performance guarantees for data-driven Model Predictive Control (MPC) for constrained linear systems. In its simplest form, imitation learning is an approach that tries to learn an expert policy by querying samples from an expert. Recent approaches to data-driven MPC have used the simplest form of imitation learning known as behavior cloning to learn controllers that mimic the performance of MPC by online sampling of the trajectories of the closed-loop MPC system. Behavior cloning, however, is a method that is known to be data inefficient and suffer from distribution shifts. As an alternative, we develop a variant of the forward training algorithm which is an on-policy imitation learning method proposed by Ross et al. (2010). Our algorithm uses the structure of constrained linear MPC, and our analysis uses the properties of the explicit MPC solution to theoretically bound the number of online MPC trajectories needed to achieve optimal performance. We validate our results through simulations and show that the forward training algorithm is indeed superior to behavior cloning when applied to MPC.