稳定线性系统的策略梯度方法的收敛和样本复杂性

论文标题

稳定线性系统的策略梯度方法的收敛和样本复杂性

Convergence and Sample Complexity of Policy Gradient Methods for Stabilizing Linear Systems

论文作者

Zhao, Feiran, Fu, Xingyun, You, Keyou

论文摘要

通过策略梯度（PG）方法进行的系统稳定已经引起了控制和机器学习社区的越来越多的关注。在本文中，我们研究了它们的收敛性和样品复杂性，以根据系统推出的数量来稳定线性时间不变系统。我们的分析是基于折扣线性二次调节器（LQR）方法构建的，该方法替代更新了LQR问题的策略和折现因子。首先，我们提出了一个明确的规则，以通过探索线性控制策略的稳定性余量来适应折现因子。然后，我们建立了PG方法的样本复杂性以进行稳定，这仅在状态矩阵的光谱半径中增加了系数对数，以通过先前的稳定策略来解决LQR问题。最后，我们执行模拟来验证我们的理论发现，并证明我们方法对一类非线性系统的有效性。

System stabilization via policy gradient (PG) methods has drawn increasing attention in both control and machine learning communities. In this paper, we study their convergence and sample complexity for stabilizing linear time-invariant systems in terms of the number of system rollouts. Our analysis is built upon a discounted linear quadratic regulator (LQR) method which alternatively updates the policy and the discount factor of the LQR problem. Firstly, we propose an explicit rule to adaptively adjust the discount factor by exploring the stability margin of a linear control policy. Then, we establish the sample complexity of PG methods for stabilization, which only adds a coefficient logarithmic in the spectral radius of the state matrix to that for solving the LQR problem with a prior stabilizing policy. Finally, we perform simulations to validate our theoretical findings and demonstrate the effectiveness of our method on a class of nonlinear systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题