有限摩擦探索性线性季度控制问题的政策梯度方法的融合

论文标题

有限摩擦探索性线性季度控制问题的政策梯度方法的融合

Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems

论文作者

Giegrich, Michael, Reisinger, Christoph, Zhang, Yufei

论文摘要

我们研究了有限的 - 霍森连续探索性线性界限控制（LQC）问题的策略梯度（PG）方法的全球线性收敛。该设置包括随机LQC的不确定成本问题，并允许目标中的其他熵正规疗法员。我们考虑了一个连续的高斯政策，其平均值在状态变量中是线性的，其协方差是与国家无关的。与离散的时间问题相反，成本在政策中是无顽固的，而不是所有的下降方向都导致有限的迭代。我们分别提出了使用Fisher几何形状和Bures-Wassestein几何形状的几何形状感知的梯度下降，以实现策略的平均值和协方差。策略迭代显示可满足A-Priori的界限，并以线性速率在全球范围内融合到最佳策略。我们进一步提出了一种具有离散时间策略的新型PG方法。该算法利用了连续的时间分析，并在不同的动作频率上实现了强大的线性收敛。数值实验证实了所提出算法的收敛性和鲁棒性。

We study the global linear convergence of policy gradient (PG) methods for finite-horizon continuous-time exploratory linear-quadratic control (LQC) problems. The setting includes stochastic LQC problems with indefinite costs and allows additional entropy regularisers in the objective. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. Contrary to discrete-time problems, the cost is noncoercive in the policy and not all descent directions lead to bounded iterates. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures-Wasserstein geometry, respectively. The policy iterates are shown to satisfy an a-priori bound, and converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies. A numerical experiment confirms the convergence and robustness of the proposed algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题