论文标题
关于政策梯度方法的收敛速率
On the Convergence Rates of Policy Gradient Methods
论文作者
论文摘要
我们考虑使用有限状态和行动空间的Markov决策问题折扣,并研究了预计的策略梯度方法的收敛速度和一般的政策镜下降方法,所有这些方法都在策略领域中直接参数化。首先,我们开发了弱梯度映射优势的理论,并使用它来证明预计的策略梯度方法的更清晰的均方根收敛速率。然后,我们表明,随着几何增加的步骤尺寸,一类策略镜下降方法(包括自然策略梯度方法和预计的Q-淡淡方法)都享有线性收敛速率,而无需依赖熵或其他强烈的凸正则化。最后,我们还分析了不精确策略镜下降方法的收敛速率,并在简单的生成模型下估计其样品复杂性。
We consider infinite-horizon discounted Markov decision problems with finite state and action spaces and study the convergence rates of the projected policy gradient method and a general class of policy mirror descent methods, all with direct parametrization in the policy space. First, we develop a theory of weak gradient-mapping dominance and use it to prove sharper sublinear convergence rate of the projected policy gradient method. Then we show that with geometrically increasing step sizes, a general class of policy mirror descent methods, including the natural policy gradient method and a projected Q-descent method, all enjoy a linear rate of convergence without relying on entropy or other strongly convex regularization. Finally, we also analyze the convergence rate of an inexact policy mirror descent method and estimate its sample complexity under a simple generative model.