使用蒙特卡洛树搜索通过策略迭代的单一代理优化

论文标题

使用蒙特卡洛树搜索通过策略迭代的单一代理优化

Single-Agent Optimization Through Policy Iteration Using Monte-Carlo Tree Search

论文作者

Seify, Arta, Buro, Michael

论文摘要

蒙特 - 卡洛树搜索（MCT）和深入加固学习的结合是两人完美信息游戏中的最新技术。在本文中，我们描述了一种使用MCT变体的搜索算法，该算法通过1）具有潜在无限奖励的游戏的新型动作价值正常化机制（在许多优化问题中就是这种情况），2）定义虚拟损失功能，可以定义有效的搜索并行化，以及3）由搜索搜索的一代人培训的策略网络。我们在“ samegame”中评估了我们方法的有效性---流行的单人游戏测试域。我们的实验结果表明，我们的方法的表现优于几个董事会大小的基线算法。此外，它与最先进的搜索算法具有竞争力。

The combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement learning is state-of-the-art in two-player perfect-information games. In this paper, we describe a search algorithm that uses a variant of MCTS which we enhanced by 1) a novel action value normalization mechanism for games with potentially unbounded rewards (which is the case in many optimization problems), 2) defining a virtual loss function that enables effective search parallelization, and 3) a policy network, trained by generations of self-play, to guide the search. We gauge the effectiveness of our method in "SameGame"---a popular single-player test domain. Our experimental results indicate that our method outperforms baseline algorithms on several board sizes. Additionally, it is competitive with state-of-the-art search algorithms on a public set of positions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题