目标网络和截断克服了$ q $ - 学习的致命三合会

论文标题

目标网络和截断克服了$ q $ - 学习的致命三合会

Target Network and Truncation Overcome The Deadly Triad in $Q$-Learning

论文作者

Chen, Zaiwei, Clarke, John Paul, Maguluri, Siva Theja

论文摘要

$ q $ - 具有功能近似的$ Q $是经验上最成功的，而理论上神秘的强化学习（RL）算法是，在Sutton（1999）中被确定为RL社区中最重要的理论开放问题之一。即使在基本的线性函数近似设置中，也有众所周知的不同示例。在这项工作中，我们表明\ textit {target网络}和\ textit {truncation}一起足以通过线性函数近似可以稳定$ q $ - 学习，并且我们建立了有限的样本保证。结果意味着$ o（ε^{ - 2}）$样本复杂性，直至函数近似误差。此外，我们的结果不需要强烈的假设或像现有文献那样修改问题参数。

$Q$-learning with function approximation is one of the most empirically successful while theoretically mysterious reinforcement learning (RL) algorithms, and was identified in Sutton (1999) as one of the most important theoretical open problems in the RL community. Even in the basic linear function approximation setting, there are well-known divergent examples. In this work, we show that \textit{target network} and \textit{truncation} together are enough to provably stabilize $Q$-learning with linear function approximation, and we establish the finite-sample guarantees. The result implies an $O(ε^{-2})$ sample complexity up to a function approximation error. Moreover, our results do not require strong assumptions or modifying the problem parameters as in existing literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题