Minimax样本复杂性的基于转弯的随机游戏

论文标题

Minimax样本复杂性的基于转弯的随机游戏

Minimax Sample Complexity for Turn-based Stochastic Game

论文作者

Cui, Qiwen, Yang, Lin F.

论文摘要

多方强化学习的经验成功令人鼓舞，而很少有理论保证。在这项工作中，我们证明了插入式求解器方法，可能是最自然的增强学习算法，可以实现基于转弯的随机游戏（TBSG）的最小样本复杂性。具体而言，我们通过利用允许从任意状态行动对采样的“模拟器”来计划经验TBSG。我们表明，经验NASH均衡策略是真实TBSG中近似的NASH平衡策略，并给出了问题依赖性和与问题无关的结合。我们开发了吸收TBSG和奖励扰动技术，以应对复杂的统计依赖性。关键的想法是人为地在TBSG中引入次级优势差距，然后NASH均衡策略在于有限的集合。

The empirical success of Multi-agent reinforcement learning is encouraging, while few theoretical guarantees have been revealed. In this work, we prove that the plug-in solver approach, probably the most natural reinforcement learning algorithm, achieves minimax sample complexity for turn-based stochastic game (TBSG). Specifically, we plan in an empirical TBSG by utilizing a `simulator' that allows sampling from arbitrary state-action pair. We show that the empirical Nash equilibrium strategy is an approximate Nash equilibrium strategy in the true TBSG and give both problem-dependent and problem-independent bound. We develop absorbing TBSG and reward perturbation techniques to tackle the complex statistical dependence. The key idea is artificially introducing a suboptimality gap in TBSG and then the Nash equilibrium strategy lies in a finite set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题