公开：将基于州的探索与基于梯度的在线搜索相结合

论文标题

公开：将基于州的探索与基于梯度的在线搜索相结合

ExPoSe: Combining State-Based Exploration with Gradient-Based Online Search

论文作者

Mittal, Dixant, Aravindan, Siddharth, Lee, Wee Sun

论文摘要

基于在线树的搜索算法迭代模拟轨迹，并为存储在树结构中的一组状态更新操作值。它在实践中运作良好，但无法有效利用从类似州收集的信息。根据动作值函数的平稳性，克服此问题的一种方法是通过在线学习，其中信息在相似状态之间被插值。政策梯度搜索提供了实用算法来实现这一目标。但是，策略梯度搜索缺乏明确的探索机制，这是基于树的在线搜索算法的关键特征。在本文中，我们提出了一种称为探索性策略梯度搜索（Expose）的高效有效的在线搜索算法，该算法通过直接更新搜索策略参数来利用州之间的信息共享，同时在在线搜索过程中结合了明确的探索机制。我们在稀疏图中评估了一系列决策问题，包括Atari Games，Sokoban和Hamilton-Cycle搜索。结果表明，在所有域中，始终如一地公开了其他流行的在线搜索算法。公开源代码可在\ textit {\ url {https://github.com/dixantmittal/expose}}中获得。

Online tree-based search algorithms iteratively simulate trajectories and update action-values for a set of states stored in a tree structure. It works reasonably well in practice but fails to effectively utilise the information gathered from similar states. Depending upon the smoothness of the action-value function, one approach to overcoming this issue is through online learning, where information is interpolated among similar states; Policy Gradient Search provides a practical algorithm to achieve this. However, Policy Gradient Search lacks an explicit exploration mechanism, which is a key feature of tree-based online search algorithms. In this paper, we propose an efficient and effective online search algorithm called Exploratory Policy Gradient Search (ExPoSe), which leverages information sharing among states by updating the search policy parameters directly, while incorporating a well-defined exploration mechanism during the online search process. We evaluate ExPoSe on a range of decision-making problems, including Atari games, Sokoban, and Hamiltonian cycle search in sparse graphs. The results demonstrate that ExPoSe consistently outperforms other popular online search algorithms across all domains. The ExPoSe source code is available at \textit{\url{https://github.com/dixantmittal/ExPoSe}}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题