论文标题
一些用于克服深度强化学习算法中高估的方法
Some approaches used to overcome overestimation in Deep Reinforcement Learning algorithms
论文作者
论文摘要
讨论了与统计噪声相关的一些现象,这些现象已在深钢筋学习(RL)算法的框架下进行了研究。检查了以下算法:深Q网络(DQN),双DQN,深层确定性政策梯度(DDPG),双延迟DDPG(TD3)和爬山攀岩算法。首先,我们考虑高估,这是噪声引起的有害特性。然后,我们处理用于探索的噪声,这是有用的噪声。我们讨论与hopperbulletenv和walker2dbulletenv等典型的pybullet环境中设置TD3中的噪声参数。在附录中,与爬山算法有关,考虑了与噪声相关的另一个示例 - 自适应噪声的一个例子。
Some phenomena related to statistical noise which have been investigated by various authors under the framework of deep reinforcement learning (RL) algorithms are discussed. The following algorithms are examined: the deep Q-network (DQN), double DQN, deep deterministic policy gradient (DDPG), twin-delayed DDPG (TD3), and hill climbing algorithm. First, we consider overestimation, which is a harmful property resulting from noise. Then we deal with noise used for exploration, this is the useful noise. We discuss setting the noise parameter in the TD3 for typical PyBullet environments associated with articulate bodies such as HopperBulletEnv and Walker2DBulletEnv. In the appendix, in relation to the hill climbing algorithm, another example related to noise is considered - an example of adaptive noise.