Min-Max Q学习用于多玩家追求逃避游戏

论文标题

Min-Max Q学习用于多玩家追求逃避游戏

Min-Max Q-Learning for Multi-Player Pursuit-Evasion Games

论文作者

Selvakumar, Jhanani, Bakolas, Efstathios

论文摘要

在本文中，我们通过利用强化学习和矩阵游戏理论的工具和技术来解决涉及多个玩家的追求逃避游戏。特别是，我们考虑了将逃避者转向目标目的地的问题，同时避免了多个追随者的捕获，这通常是一个高维且计算上棘手的问题。在我们提出的方法中，我们首先将多代理追求游戏制作为一系列离散的矩阵游戏。接下来，为了简化解决方案过程，我们将高维状态空间转换为低维流形和连续的动作空间变成基于特征的空间，这是原始空间的离散抽象。基于这些转换的状态和行动空间，我们随后采用了Min-Max Q学习，以生成游戏的回报矩阵的条目，然后在每个阶段获得逃避者的最佳动作。最后，我们提出了广泛的数值模拟，以根据逃避者达到所需目标位置而无需捕获以及计算效率的能力来评估所提出的基于学习的逃避策略的性能。

In this paper, we address a pursuit-evasion game involving multiple players by utilizing tools and techniques from reinforcement learning and matrix game theory. In particular, we consider the problem of steering an evader to a goal destination while avoiding capture by multiple pursuers, which is a high-dimensional and computationally intractable problem in general. In our proposed approach, we first formulate the multi-agent pursuit-evasion game as a sequence of discrete matrix games. Next, in order to simplify the solution process, we transform the high-dimensional state space into a low-dimensional manifold and the continuous action space into a feature-based space, which is a discrete abstraction of the original space. Based on these transformed state and action spaces, we subsequently employ min-max Q-learning, to generate the entries of the payoff matrix of the game, and subsequently obtain the optimal action for the evader at each stage. Finally, we present extensive numerical simulations to evaluate the performance of the proposed learning-based evading strategy in terms of the evader's ability to reach the desired target location without being captured, as well as computational efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题