论文标题
马尔可夫在随机游戏中的复杂性
The Complexity of Markov Equilibrium in Stochastic Games
论文作者
论文摘要
我们表明,在通用 - 随机游戏中计算近似固定的马尔可夫粗相关平衡(CCE)在计算上也很棘手,即使有两个玩家,游戏是基于转弯的,折现因子是绝对常数,近似值是绝对常数。我们的顽固性结果与正常形式的游戏形成鲜明对比,在该游戏中,精确的CCE有效地计算。我们的结果是一个Fortiori,这意味着,即使相互作用是两者且基于转向的,并且折现因子和所需的策略的近似值是绝对常数。反过来,这些结果与单药加固学习(RL)形成鲜明对比,在该学习中,近乎最佳的马尔可夫政策可以有效地学习。在固定的马尔可夫CCE中补充了我们的棘手性结果,我们提供了一种分散的算法(假设玩家之间共享的随机性),以学习所有问题参数中具有多项式时间和样本复杂性的非平稳马尔可夫CCE策略。以前的学习马尔可夫CCE策略的工作都需要指数时间和样本播放器数量的复杂性。
We show that computing approximate stationary Markov coarse correlated equilibria (CCE) in general-sum stochastic games is computationally intractable, even when there are two players, the game is turn-based, the discount factor is an absolute constant, and the approximation is an absolute constant. Our intractability results stand in sharp contrast to normal-form games where exact CCEs are efficiently computable. A fortiori, our results imply that there are no efficient algorithms for learning stationary Markov CCE policies in multi-agent reinforcement learning (MARL), even when the interaction is two-player and turn-based, and both the discount factor and the desired approximation of the learned policies is an absolute constant. In turn, these results stand in sharp contrast to single-agent reinforcement learning (RL) where near-optimal stationary Markov policies can be efficiently learned. Complementing our intractability results for stationary Markov CCEs, we provide a decentralized algorithm (assuming shared randomness among players) for learning a nonstationary Markov CCE policy with polynomial time and sample complexity in all problem parameters. Previous work for learning Markov CCE policies all required exponential time and sample complexity in the number of players.