扫雷器的多臂匪徒：从勘探探索协同作用中获利

论文标题

扫雷器的多臂匪徒：从勘探探索协同作用中获利

Multi-Armed Bandits for Minesweeper: Profiting from Exploration-Exploitation Synergy

论文作者

Lordeiro, Igor Q., Haddad, Diego B., Cardoso, Douglas O.

论文摘要

《扫雷游戏》是一个流行的计算机难题，要求其人类玩家将运气和策略都结合在一起，以取得成功。在我们的研究中，我们更正式地分析这些方面，我们评估了基于强化学习的新方法的可行性，以此作为解决该游戏提出的问题的适当方法。为此，我们采用了多军匪徒算法，这些算法是经过精心调整的，以便能够使用自主计算播放器，以最佳利用某些游戏特殊性。经过实验评估，结果表明这种方法确实成功了，尤其是在较小的游戏板中，例如标准初学者级别。尽管这项工作的主要贡献是从学习的角度对扫雷器的详细研究，这导致了各种原始见解，这些见解得到了彻底讨论。

A popular computer puzzle, the game of Minesweeper requires its human players to have a mix of both luck and strategy to succeed. Analyzing these aspects more formally, in our research we assessed the feasibility of a novel methodology based on Reinforcement Learning as an adequate approach to tackle the problem presented by this game. For this purpose we employed Multi-Armed Bandit algorithms which were carefully adapted in order to enable their use to define autonomous computational players, targeting to make the best use of some game peculiarities. After experimental evaluation, results showed that this approach was indeed successful, especially in smaller game boards, such as the standard beginner level. Despite this fact the main contribution of this work is a detailed examination of Minesweeper from a learning perspective, which led to various original insights which are thoroughly discussed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题