Douzero+：通过对手建模和教练指导学习改进Doudizhu AI

论文标题

Douzero+：通过对手建模和教练指导学习改进Doudizhu AI

DouZero+: Improving DouDizhu AI by Opponent Modeling and Coach-guided Learning

论文作者

Zhao, Youpeng, Zhao, Jian, Hu, Xunhan, Zhou, Wengang, Li, Houqiang

论文摘要

近年来，在各种完美和不完美的信息游戏中，深入强化学习（DRL）的巨大突破。在这些游戏中，由于信息不完美，国家空间，协作元素以及从转弯转弯的不完美，国家空间，协作元素以及大量可能的举动，杜迪兹（Doudizhu）非常具有挑战性。最近，提出了一个名为Douzero的Doudizhu AI系统。 Douzero使用传统的蒙特卡洛方法和深度神经网络和自我播放程序进行培训，而自我播放程序的培训超出了所有现有的doudizhu ai程序。在这项工作中，我们建议通过将对手建模引入Douzero来增强Douzero。此外，我们提出了一个新颖的教练网络，以进一步提高Douzero的表现并加速其训练过程。通过将上述两种技术集成到Douzero中，我们的Doudizhu AI系统在包括Douzero在内的400多个AI代理中取得了更好的性能，并在botzone排行榜中排名最高。

Recent years have witnessed the great breakthrough of deep reinforcement learning (DRL) in various perfect and imperfect information games. Among these games, DouDizhu, a popular card game in China, is very challenging due to the imperfect information, large state space, elements of collaboration and a massive number of possible moves from turn to turn. Recently, a DouDizhu AI system called DouZero has been proposed. Trained using traditional Monte Carlo method with deep neural networks and self-play procedure without the abstraction of human prior knowledge, DouZero has outperformed all the existing DouDizhu AI programs. In this work, we propose to enhance DouZero by introducing opponent modeling into DouZero. Besides, we propose a novel coach network to further boost the performance of DouZero and accelerate its training process. With the integration of the above two techniques into DouZero, our DouDizhu AI system achieves better performance and ranks top in the Botzone leaderboard among more than 400 AI agents, including DouZero.

下载PDF全文

下载文献需遵守相关版权规定

论文标题