论文标题
学习平均场游戏的一般框架
A General Framework for Learning Mean-Field Games
论文作者
论文摘要
本文介绍了一般的均值游戏(GMFG)框架,用于在人口众多的随机游戏中同时学习和决策。它首先确定了该GMFG独特的NASH平衡,并证明将加固学习与经典MFG中的固定点结合起来会产生不稳定的算法。然后,它通过平滑的策略提出了基于价值和基于策略的增强学习算法(分别为GMF-V和GMF-P),并分析其收敛属性和计算复杂性。在均衡产品定价问题上进行的实验表明,GMF-V-Q和GMF-P-TRPO分别是GMF-V和GMF-P的两个特定实例,Q-LEARNING和TRPO在GMFG设置中既有效且有效。此外,与现有的$ n $播放器设置中的多代理强化学习算法相比,它们的性能在收敛速度,准确性和稳定性上都出色。
This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision-making in stochastic games with a large population. It first establishes the existence of a unique Nash Equilibrium to this GMFG, and demonstrates that naively combining reinforcement learning with the fixed-point approach in classical MFGs yields unstable algorithms. It then proposes value-based and policy-based reinforcement learning algorithms (GMF-V and GMF-P, respectively) with smoothed policies, with analysis of their convergence properties and computational complexities. Experiments on an equilibrium product pricing problem demonstrate that GMF-V-Q and GMF-P-TRPO, two specific instantiations of GMF-V and GMF-P, respectively, with Q-learning and TRPO, are both efficient and robust in the GMFG setting. Moreover, their performance is superior in convergence speed, accuracy, and stability when compared with existing algorithms for multi-agent reinforcement learning in the $N$-player setting.