与连续和离散时间控制系统系统和增强学习的折现因子的逆最佳控制因素

论文标题

与连续和离散时间控制系统系统和增强学习的折现因子的逆最佳控制因素

Inverse Optimal Control with Discount Factor for Continuous and Discrete-Time Control-Affine Systems and Reinforcement Learning

论文作者

Rodrigues, Luis

论文摘要

本文解决了在输入固定为二次时，找到导致二次值函数的状态加权函数的反向最佳控制问题。本文着重于一类无限的地平线离散时间和连续的最佳控制问题，这些问题是控制效果，其成本在输入中是二次的。此问题的最佳控制策略是减去值函数的梯度上的投影，这些函数在所有可行的控制方向形成的空间上。该投影沿最大值函数最大降低的控制方向。对于离散时间系统和二次值函数，最佳控制定律可以作为正规化最小二乘程序的解决方案获得，这对应于以前一步的后退视野控制。对于单个输入情况和二次值函数，控制能量中的小权重的解决方案被解释为控制策略，每个步骤在每个步骤中都会使系统的轨迹尽可能接近原点，这是通过适当的规范来衡量的。还陈述了最佳控制定律线性的条件。此外，本文将最佳控制配方的映射与等效的增强学习公式映射。示例显示了理论结果的应用。

This paper addresses the inverse optimal control problem of finding the state weighting function that leads to a quadratic value function when the cost on the input is fixed to be quadratic. The paper focuses on a class of infinite horizon discrete-time and continuous-time optimal control problems whose dynamics are control-affine and whose cost is quadratic in the input. The optimal control policy for this problem is the projection of minus the gradient of the value function onto the space formed by all feasible control directions. This projection points along the control direction of steepest decrease of the value function. For discrete-time systems and a quadratic value function the optimal control law can be obtained as the solution of a regularized least squares program, which corresponds to a receding horizon control with a single step ahead. For the single input case and a quadratic value function the solution for small weights in the control energy is interpreted as a control policy that at each step brings the trajectories of the system as close as possible to the origin, as measured by an appropriate norm. Conditions under which the optimal control law is linear are also stated. Additionally, the paper offers a mapping of the optimal control formulation to an equivalent reinforcement learning formulation. Examples show the application of the theoretical results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题