论文标题
合并的自适应T-Soft更新,用于深入增强学习
Consolidated Adaptive T-soft Update for Deep Reinforcement Learning
论文作者
论文摘要
对深度加强学习(DRL)的需求逐渐增加,以使机器人能够执行复杂的任务,而DRL则是不稳定的。作为稳定其学习的一种技术,可以缓慢又渐近地匹配主网络的目标网络被广泛用于生成稳定的伪监督信号。最近,提出了T-Soft更新作为目标网络的噪声更新规则,并有助于提高DRL性能。但是,T-Soft更新的噪声稳健性由ARPYGARETER指定,该功能应为每个任务调整,并通过简化的实现来恶化。这项研究通过使用最近开发的Adaterm中的更新规则来开发自适应T-Soft(AT-SOFT)更新。此外,对于将主要网络带回目标网络的新合并来减轻目标网络与主网络不渐近匹配的担忧。这种所谓的合并At-Soft(cat-soft)更新将通过数值模拟进行验证。
Demand for deep reinforcement learning (DRL) is gradually increased to enable robots to perform complex tasks, while DRL is known to be unstable. As a technique to stabilize its learning, a target network that slowly and asymptotically matches a main network is widely employed to generate stable pseudo-supervised signals. Recently, T-soft update has been proposed as a noise-robust update rule for the target network and has contributed to improving the DRL performance. However, the noise robustness of T-soft update is specified by a hyperparameter, which should be tuned for each task, and is deteriorated by a simplified implementation. This study develops adaptive T-soft (AT-soft) update by utilizing the update rule in AdaTerm, which has been developed recently. In addition, the concern that the target network does not asymptotically match the main network is mitigated by a new consolidation for bringing the main network back to the target network. This so-called consolidated AT-soft (CAT-soft) update is verified through numerical simulations.