学习平均场游戏的一般框架

论文标题

学习平均场游戏的一般框架

A General Framework for Learning Mean-Field Games

论文作者

Guo, Xin, Hu, Anran, Xu, Renyuan, Zhang, Junzi

论文摘要

本文介绍了一般的均值游戏（GMFG）框架，用于在人口众多的随机游戏中同时学习和决策。它首先确定了该GMFG独特的NASH平衡，并证明将加固学习与经典MFG中的固定点结合起来会产生不稳定的算法。然后，它通过平滑的策略提出了基于价值和基于策略的增强学习算法（分别为GMF-V和GMF-P），并分析其收敛属性和计算复杂性。在均衡产品定价问题上进行的实验表明，GMF-V-Q和GMF-P-TRPO分别是GMF-V和GMF-P的两个特定实例，Q-LEARNING和TRPO在GMFG设置中既有效且有效。此外，与现有的$ n $播放器设置中的多代理强化学习算法相比，它们的性能在收敛速度，准确性和稳定性上都出色。

This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision-making in stochastic games with a large population. It first establishes the existence of a unique Nash Equilibrium to this GMFG, and demonstrates that naively combining reinforcement learning with the fixed-point approach in classical MFGs yields unstable algorithms. It then proposes value-based and policy-based reinforcement learning algorithms (GMF-V and GMF-P, respectively) with smoothed policies, with analysis of their convergence properties and computational complexities. Experiments on an equilibrium product pricing problem demonstrate that GMF-V-Q and GMF-P-TRPO, two specific instantiations of GMF-V and GMF-P, respectively, with Q-learning and TRPO, are both efficient and robust in the GMFG setting. Moreover, their performance is superior in convergence speed, accuracy, and stability when compared with existing algorithms for multi-agent reinforcement learning in the $N$-player setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题