通过概率完成探索的加固学习

论文标题

通过概率完成探索的加固学习

Reinforcement Learning with Probabilistically Complete Exploration

论文作者

Morere, Philippe, Francis, Gilad, Blau, Tom, Ramos, Fabio

论文摘要

平衡探索和剥削仍然是强化学习（RL）的关键挑战。最先进的RL算法遭受了较高的样本复杂性，尤其是在稀疏奖励案例中，在发现第一个正奖励之前，他们做得比在各个方向上探索更好。为了减轻这种情况，我们提出了快速随机探索的增强学习（R3L）。我们将探索作为搜索问题，并利用广泛使用的计划算法，例如快速探索的随机树（RRT）找到初始解决方案。这些解决方案被用作初始化策略的演示，然后通过通用RL算法进行完善，从而导致更快，更稳定的收敛性。我们提供了R3L探索的理论保证，找到成功的解决方案以及其采样复杂性的范围。我们在实验上证明了该方法的表现优于经典和内在的探索技术，只需要一小部分探索样本，并实现更好的渐近性能。

Balancing exploration and exploitation remains a key challenge in reinforcement learning (RL). State-of-the-art RL algorithms suffer from high sample complexity, particularly in the sparse reward case, where they can do no better than to explore in all directions until the first positive rewards are found. To mitigate this, we propose Rapidly Randomly-exploring Reinforcement Learning (R3L). We formulate exploration as a search problem and leverage widely-used planning algorithms such as Rapidly-exploring Random Tree (RRT) to find initial solutions. These solutions are used as demonstrations to initialize a policy, then refined by a generic RL algorithm, leading to faster and more stable convergence. We provide theoretical guarantees of R3L exploration finding successful solutions, as well as bounds for its sampling complexity. We experimentally demonstrate the method outperforms classic and intrinsic exploration techniques, requiring only a fraction of exploration samples and achieving better asymptotic performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题