基于动量的政策梯度方法

论文标题

基于动量的政策梯度方法

Momentum-Based Policy Gradient Methods

论文作者

Huang, Feihu, Gao, Shangqian, Pei, Jian, Huang, Heng

论文摘要

在本文中，我们建议使用自适应学习率的一类高效基于动量的策略梯度方法，这些方法使用自适应学习率，不需要任何大批量。具体而言，我们提出了一种基于新的基于动量的方差降低技术和重要性采样技术的基于快速采样的策略梯度（IS-MBPG）方法。我们还提出了一种基于基于动量的方差降低技术和Hessian ADED技术的快速基于Hessian AD的策略梯度（HA-MBPG）方法。此外，我们证明，IS-MBPG和HA-MBPG方法都达到了$ O（ε^{ - 3}）$的最著名样本复杂性，用于查找非共轴性能函数的$ε$ - 定位点，在每种迭代中仅需要一个轨迹。特别是，我们提出了IS-MBPG方法的非自适应版本，即IS-MBPG*，它也达到了$ O（ε^{ - 3}）$的最著名样本复杂性，而没有任何大批次。在实验中，我们应用四个基准任务来证明我们的算法的有效性。

In the paper, we propose a class of efficient momentum-based policy gradient methods for the model-free reinforcement learning, which use adaptive learning rates and do not require any large batches. Specifically, we propose a fast important-sampling momentum-based policy gradient (IS-MBPG) method based on a new momentum-based variance reduced technique and the importance sampling technique. We also propose a fast Hessian-aided momentum-based policy gradient (HA-MBPG) method based on the momentum-based variance reduced technique and the Hessian-aided technique. Moreover, we prove that both the IS-MBPG and HA-MBPG methods reach the best known sample complexity of $O(ε^{-3})$ for finding an $ε$-stationary point of the non-concave performance function, which only require one trajectory at each iteration. In particular, we present a non-adaptive version of IS-MBPG method, i.e., IS-MBPG*, which also reaches the best known sample complexity of $O(ε^{-3})$ without any large batches. In the experiments, we apply four benchmark tasks to demonstrate the effectiveness of our algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题