MRAC-RL：参数模型不确定性下的在线政策适应框架

论文标题

MRAC-RL：参数模型不确定性下的在线政策适应框架

MRAC-RL: A Framework for On-Line Policy Adaptation Under Parametric Model Uncertainty

论文作者

Guha, Anubhav, Annaswamy, Anuradha

论文摘要

增强学习（RL）算法已成功用于制定动态系统的控制策略。对于许多这样的系统，这些政策是在模拟环境中培训的。由于模拟模型与真实系统动力学之间的差异，RL训练的策略通常在部署在现实世界环境中时通常无法概括和适当地适应。当前桥接这一模拟差距的研究主要集中在模拟设计的改进以及改进和专业的RL算法的开发上，以生成强大的控制策略。在本文中，我们应用自适应控制和系统识别的原则来开发模型引用自适应控制和增强学习（MRAC-RL）框架。我们提出了一组适用于广泛线性和非线性系统的新型MRAC算法，并得出相关的控制定律。 MRAC-RL框架利用了内环自适应控制器，即使存在参数模型不确定性，也允许模拟训练的外环策略在测试环境中有效地适应和操作。我们证明，MRAC-RL方法对制定控制策略的最新RL算法有所改善，这些算法可以应用于具有建模误差的系统。

Reinforcement learning (RL) algorithms have been successfully used to develop control policies for dynamical systems. For many such systems, these policies are trained in a simulated environment. Due to discrepancies between the simulated model and the true system dynamics, RL trained policies often fail to generalize and adapt appropriately when deployed in the real-world environment. Current research in bridging this sim-to-real gap has largely focused on improvements in simulation design and on the development of improved and specialized RL algorithms for robust control policy generation. In this paper we apply principles from adaptive control and system identification to develop the model-reference adaptive control & reinforcement learning (MRAC-RL) framework. We propose a set of novel MRAC algorithms applicable to a broad range of linear and nonlinear systems, and derive the associated control laws. The MRAC-RL framework utilizes an inner-loop adaptive controller that allows a simulation-trained outer-loop policy to adapt and operate effectively in a test environment, even when parametric model uncertainty exists. We demonstrate that the MRAC-RL approach improves upon state-of-the-art RL algorithms in developing control policies that can be applied to systems with modeling errors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题