论文标题
具有同步目标的随机游戏
Stochastic Games with Synchronizing Objectives
论文作者
论文摘要
我们考虑在无限多轮的有限图上播放的两个玩家随机游戏。随机游戏通过添加对手玩家和两人确定性游戏来概括马尔可夫决策过程(MDP),通过添加随机性。游戏的结果是游戏图状态上的一系列发行序列。我们考虑同步目标,这些目标需要概率质量堆积在一组目标状态中,始终,一次,无限地,或总是在结果序列中的某个点之后。以及确定获胜的获胜模式(如果累积的概率等于1)和几乎是胜利的获胜方式(如果累积的概率任意接近1)。 我们提出算法来计算这些同步模式中的每一个的获胜分布集,表明相应的决策问题是pspace-complete,用于一次性同步一次,并且经常经常同步,并且在某个点之后始终且始终同步。这些界限非常符合MDP的特殊情况,而算法解决方案和证明技术也更加涉及确定性游戏。这是因为这些游戏具有不完美的信息的味道,尤其没有确定它们,即使游戏图中没有随机选择,也需要考虑随机策略。此外,结合游戏图中的随机性,有限的内存策略通常不够(用于无限地同步)。
We consider two-player stochastic games played on a finite graph for infinitely many rounds. Stochastic games generalize both Markov decision processes (MDP) by adding an adversary player, and two-player deterministic games by adding stochasticity. The outcome of the game is a sequence of distributions over the states of the game graph. We consider synchronizing objectives, which require the probability mass to accumulate in a set of target states, either always, once, infinitely often, or always after some point in the outcome sequence; and the winning modes of sure winning (if the accumulated probability is equal to 1) and almost-sure winning (if the accumulated probability is arbitrarily close to 1). We present algorithms to compute the set of winning distributions for each of these synchronizing modes, showing that the corresponding decision problem is PSPACE-complete for synchronizing once and infinitely often, and PTIME-complete for synchronizing always and always after some point. These bounds are remarkably in line with the special case of MDPs, while the algorithmic solution and proof technique are considerably more involved, even for deterministic games. This is because those games have a flavour of imperfect information, in particular they are not determined and randomized strategies need to be considered, even if there is no stochastic choice in the game graph. Moreover, in combination with stochasticity in the game graph, finite-memory strategies are not sufficient in general (for synchronizing infinitely often).