论文标题

Minimax样本复杂性的基于转弯的随机游戏

Minimax Sample Complexity for Turn-based Stochastic Game

论文作者

Cui, Qiwen, Yang, Lin F.

论文摘要

多方强化学习的经验成功令人鼓舞,而很少有理论保证。在这项工作中,我们证明了插入式求解器方法,可能是最自然的增强学习算法,可以实现基于转弯的随机游戏(TBSG)的最小样本复杂性。具体而言,我们通过利用允许从任意状态行动对采样的“模拟器”来计划经验TBSG。我们表明,经验NASH均衡策略是真实TBSG中近似的NASH平衡策略,并给出了问题依赖性和与问题无关的结合。我们开发了吸收TBSG和奖励扰动技术,以应对复杂的统计依赖性。关键的想法是人为地在TBSG中引入次级优势差距,然后NASH均衡策略在于有限的集合。

The empirical success of Multi-agent reinforcement learning is encouraging, while few theoretical guarantees have been revealed. In this work, we prove that the plug-in solver approach, probably the most natural reinforcement learning algorithm, achieves minimax sample complexity for turn-based stochastic game (TBSG). Specifically, we plan in an empirical TBSG by utilizing a `simulator' that allows sampling from arbitrary state-action pair. We show that the empirical Nash equilibrium strategy is an approximate Nash equilibrium strategy in the true TBSG and give both problem-dependent and problem-independent bound. We develop absorbing TBSG and reward perturbation techniques to tackle the complex statistical dependence. The key idea is artificially introducing a suboptimality gap in TBSG and then the Nash equilibrium strategy lies in a finite set.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源