论文标题

在广义框架下的自我游戏算法的比较

A Comparison of Self-Play Algorithms Under a Generalized Framework

论文作者

Hernandez, Daniel, Denamganai, Kevin, Devlin, Sam, Samothrakis, Spyridon, Walker, James Alfred

论文摘要

在整个科学史上,总体理论框架使研究人员能够超越个人直觉和文化偏见的理论。他们允许验证和复制现有发现,并且链接是连接的结果。自我游戏的概念,尽管在多基因增强学习中经常引用,但从未以正式模型为基础。我们提出了一个形式上的框架,并具有明确定义的假设,该框架封装了从各种现有的自我游戏算法中抽象的自我播放的含义。该框架被构建为用于多种训练的理论解决方案概念的近似值。在一个简单的环境上,我们定性地测量捕获的自我播放方法的子集与著名的PPO算法配对时近似于该解决方案。我们还提供了解释自我竞争训练绩效定量指标的见解。我们的结果表明,在整个培训中,各种自我播放定义都表现出循环政策的演变。

Throughout scientific history, overarching theoretical frameworks have allowed researchers to grow beyond personal intuitions and culturally biased theories. They allow to verify and replicate existing findings, and to link is connected results. The notion of self-play, albeit often cited in multiagent Reinforcement Learning, has never been grounded in a formal model. We present a formalized framework, with clearly defined assumptions, which encapsulates the meaning of self-play as abstracted from various existing self-play algorithms. This framework is framed as an approximation to a theoretical solution concept for multiagent training. On a simple environment, we qualitatively measure how well a subset of the captured self-play methods approximate this solution when paired with the famous PPO algorithm. We also provide insights on interpreting quantitative metrics of performance for self-play training. Our results indicate that, throughout training, various self-play definitions exhibit cyclic policy evolutions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源