部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

论文作者

Yuan, Rui, Du, Simon S., Gower, Robert M., Lazaric, Alessandro, Xiao, Lin

论文摘要

我们考虑了无限 - 摩托克的折扣马尔可夫决策过程，并研究了与日志线性政策类别的自然政策梯度（NPG）和Q-NPG方法的收敛速率。使用兼容函数近似框架，这两种具有对数线性策略的方法都可以写为策略镜下降（PMD）方法的不精确版本。我们表明，这两种方法都达到线性收敛速率和$ \ tilde {\ Mathcal {o}}}（1/ε^2）$使用简单的，非自适应的几何学上增加步进大小，而无需诉诸于熵或其他强有力的正规化。最后，作为副产品，我们获得了两种具有任意恒定步长的方法的均方根收敛速率。

We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. Using the compatible function approximation framework, both methods with log-linear policies can be written as inexact versions of the policy mirror descent (PMD) method. We show that both methods attain linear convergence rates and $\tilde{\mathcal{O}}(1/ε^2)$ sample complexities using a simple, non-adaptive geometrically increasing step size, without resorting to entropy or other strongly convex regularization. Lastly, as a byproduct, we obtain sublinear convergence rates for both methods with arbitrary constant step size.

下载PDF全文

下载文献需遵守相关版权规定

论文标题