论文标题
随着剪辑优势学习,稳健的动作差距增加
Robust Action Gap Increasing with Clipped Advantage Learning
论文作者
论文摘要
优势学习(AL)试图增加最佳动作与其竞争对手之间的动作差距,以提高估计错误的鲁棒性。但是,当近似值函数引起的最佳动作与真正的最佳动作不符时,该方法就会变得有问题。在本文中,我们提出了一种新颖的方法,名为“剪辑优势学习”(剪辑AL),以解决此问题。该方法的启发是我们的观察结果,即对所有给定样本盲目增加动作差距,而不考虑其必需品可能会积累更多的绩效损失绑定错误,从而导致较慢的价值收敛,并避免这种情况,我们应该适应地调整优势值。我们表明,我们简单的AL操作员不仅享有快速收敛保证,而且还保留了适当的动作差距,因此在大动作差距和快速收敛之间取得了良好的平衡。拟议方法的可行性和有效性在经验上以有希望的性能进行经验验证。
Advantage Learning (AL) seeks to increase the action gap between the optimal action and its competitors, so as to improve the robustness to estimation errors. However, the method becomes problematic when the optimal action induced by the approximated value function does not agree with the true optimal action. In this paper, we present a novel method, named clipped Advantage Learning (clipped AL), to address this issue. The method is inspired by our observation that increasing the action gap blindly for all given samples while not taking their necessities into account could accumulate more errors in the performance loss bound, leading to a slow value convergence, and to avoid that, we should adjust the advantage value adaptively. We show that our simple clipped AL operator not only enjoys fast convergence guarantee but also retains proper action gaps, hence achieving a good balance between the large action gap and the fast convergence. The feasibility and effectiveness of the proposed method are verified empirically on several RL benchmarks with promising performance.