论文标题
基于差异的基于差异的经验重播策略优化
Variance Reduction based Experience Replay for Policy Optimization
论文作者
论文摘要
为了在许多因素动态影响输出轨迹的复杂随机系统上学习,希望有效利用先前迭代中收集的历史样本的信息来加速策略优化。经典的经验重播使代理可以通过重复历史观察来记住。但是,处理所有观察结果的统一重复使用策略同样忽略了不同样本的相对重要性。为了克服这一限制,我们提出了一个基于一般差异的经验重播(VRER)框架,该框架可以选择性地重复使用最相关的样本以改善策略梯度估计。这种选择性机制可以自适应地对过去的样本更大的重量,这些样本更可能由当前目标分布产生。我们的理论和实证研究表明,提议的VRER可以加速学习最佳政策,并提高最先进的政策优化方法。
For reinforcement learning on complex stochastic systems where many factors dynamically impact the output trajectories, it is desirable to effectively leverage the information from historical samples collected in previous iterations to accelerate policy optimization. Classical experience replay allows agents to remember by reusing historical observations. However, the uniform reuse strategy that treats all observations equally overlooks the relative importance of different samples. To overcome this limitation, we propose a general variance reduction based experience replay (VRER) framework that can selectively reuse the most relevant samples to improve policy gradient estimation. This selective mechanism can adaptively put more weight on past samples that are more likely to be generated by the current target distribution. Our theoretical and empirical studies show that the proposed VRER can accelerate the learning of optimal policy and enhance the performance of state-of-the-art policy optimization approaches.