政策迭代：对于缺乏递归可行性，所有人都不会丢失

论文标题

政策迭代：对于缺乏递归可行性，所有人都不会丢失

Policy iteration: for want of recursive feasibility, all is not lost

论文作者

Granzotto, Mathieu, De Silva, Olivier Lindamulage, Postoyan, Romain, Nesic, Dragan, Jiang, Zhong-Ping

论文摘要

本文研究了递归可行性，递归稳定性稳定性和政策迭代（PI）的近距离属性。为此，我们考虑确定性的非线性离散时间系统，其输入是由PI生成未验证的成本功能的。我们首先假设PI是可行的，从某种意义上说，在每次迭代中解决的优化问题都允许解决方案。在这种情况下，我们提供了新的条件来为一般吸引子建立递归稳定的稳定性特性，这意味着在每种迭代处生成的策略可确保相对于一般状态度量具有强大的\ kl稳定性性能。然后，我们在PI在每次迭代和最佳的（次优）值函数与最佳函数之间的（次优）值函数之间的不匹配上得出新颖的显式界限。之后，以反示例表明PI可能无法递归可行的动机，我们修改PI，以确保在轻度条件下确保先验的递归可行性。该修饰的算法（称为PI+）被证明可在吸引子紧凑时保持递归稳定性稳定性。此外，在相同的假设下，Pi+具有与其PI对应物相同的近乎临近性能。因此，PI+是一种有吸引力的工具，用于生成对确定性离散时间非线性系统的近乎最佳稳定控制。

This paper investigates recursive feasibility, recursive robust stability and near-optimality properties of policy iteration (PI). For this purpose, we consider deterministic nonlinear discrete-time systems whose inputs are generated by PI for undiscounted cost functions. We first assume that PI is recursively feasible, in the sense that the optimization problems solved at each iteration admit a solution. In this case, we provide novel conditions to establish recursive robust stability properties for a general attractor, meaning that the policies generated at each iteration ensure a robust \KL-stability property with respect to a general state measure. We then derive novel explicit bounds on the mismatch between the (suboptimal) value function returned by PI at each iteration and the optimal one. Afterwards, motivated by a counter-example that shows that PI may fail to be recursively feasible, we modify PI so that recursive feasibility is guaranteed a priori under mild conditions. This modified algorithm, called PI+, is shown to preserve the recursive robust stability when the attractor is compact. Additionally, PI+ enjoys the same near-optimality properties as its PI counterpart under the same assumptions. Therefore, PI+ is an attractive tool for generating near-optimal stabilizing control of deterministic discrete-time nonlinear systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题