M $^2 $ DQN：一种加速深度Q学习网络的强大方法

论文标题

M $^2 $ DQN：一种加速深度Q学习网络的强大方法

M$^2$DQN: A Robust Method for Accelerating Deep Q-learning Network

论文作者

Zhang, Zhe, Zou, Yukun, Lai, Junjie, Xu, Qing

论文摘要

深Q学习网络（DQN）是一种成功的方式，将增强学习与深层神经网络结合在一起，并导致广泛应用强化学习。将DQN或其他强化学习算法应用于现实世界问题时，一个具有挑战性的问题是数据收集。因此，如何提高数据效率是强化学习研究中最重要的问题之一。在本文中，我们提出了一个框架，该框架使用深Q网络中的最大均值损失（m $^2 $ dqn）。我们没有在训练步骤中抽样一批体验，而是从体验重播中采样了几批，并更新参数，以使这些批次的最大td-Error被最小化。该方法可以通过替换损耗函数与DQN算法的大多数现有技术结合使用。我们在几个健身游戏中使用了最广泛的技术，即Double DQN（DDQN）来验证该框架的有效性。结果表明，我们的方法会导致学习速度和性能的实质性提高。

Deep Q-learning Network (DQN) is a successful way which combines reinforcement learning with deep neural networks and leads to a widespread application of reinforcement learning. One challenging problem when applying DQN or other reinforcement learning algorithms to real world problem is data collection. Therefore, how to improve data efficiency is one of the most important problems in the research of reinforcement learning. In this paper, we propose a framework which uses the Max-Mean loss in Deep Q-Network (M$^2$DQN). Instead of sampling one batch of experiences in the training step, we sample several batches from the experience replay and update the parameters such that the maximum TD-error of these batches is minimized. The proposed method can be combined with most of existing techniques of DQN algorithm by replacing the loss function. We verify the effectiveness of this framework with one of the most widely used techniques, Double DQN (DDQN), in several gym games. The results show that our method leads to a substantial improvement in both the learning speed and performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题