论文标题
确定性政策梯度算法的安全和强大的经验共享
Safe and Robust Experience Sharing for Deterministic Policy Gradient Algorithms
论文作者
论文摘要
在高维连续任务中学习的学习是具有挑战性的,主要是当体验重播记忆非常有限时。我们引入了一种简单而有效的经验共享机制,用于在未来的非政策深度强化学习应用程序中进行连续行动域中的确定性政策,其中分配的经验重播缓冲液的记忆是有限的。为了克服通过从其他代理商的经验中学习而引起的外推误差,我们通过一种新型的非政策校正技术促进了我们的算法,而没有任何动作概率估计。我们测试了我们方法在挑战OpenAi Gym连续控制任务方面的有效性,并得出结论,它可以在多个代理商之间获得安全的体验,并在严格限制重播记忆时表现出强大的性能。
Learning in high dimensional continuous tasks is challenging, mainly when the experience replay memory is very limited. We introduce a simple yet effective experience sharing mechanism for deterministic policies in continuous action domains for the future off-policy deep reinforcement learning applications in which the allocated memory for the experience replay buffer is limited. To overcome the extrapolation error induced by learning from other agents' experiences, we facilitate our algorithm with a novel off-policy correction technique without any action probability estimates. We test the effectiveness of our method in challenging OpenAI Gym continuous control tasks and conclude that it can achieve a safe experience sharing across multiple agents and exhibits a robust performance when the replay memory is strictly limited.