释放深层反事实网络的潜力

论文标题

释放深层反事实网络的潜力

Unlocking the Potential of Deep Counterfactual Value Networks

论文作者

Zarick, Ryan, Pellegrino, Bryan, Brown, Noam, Banister, Caleb

论文摘要

深层反事实网络与持续解决的结合提供了一种在不完美的信息游戏中进行深度限制搜索的方法。但是，自从他们在深堆栈扑克AI中引入他们，深层反事实网络并没有广泛采用。在本文中，我们介绍了深层反事实网络的几种改进，以及反事实的遗憾最小化，并分析了每种变化的效果。我们结合了这些改进，以创建扑克人工智能。我们表明，尽管DeepStack的重新实现对强大的基准代理Slumbot失去了面对面的目的，但Supremus成功地以极大的利润击败Slumbot，并且比DeepStack对当地最佳反应的利用性较低。这些结果共同表明，通过我们的主要改进，深层反事实网络可以实现最新的性能。

Deep counterfactual value networks combined with continual resolving provide a way to conduct depth-limited search in imperfect-information games. However, since their introduction in the DeepStack poker AI, deep counterfactual value networks have not seen widespread adoption. In this paper we introduce several improvements to deep counterfactual value networks, as well as counterfactual regret minimization, and analyze the effects of each change. We combined these improvements to create the poker AI Supremus. We show that while a reimplementation of DeepStack loses head-to-head against the strong benchmark agent Slumbot, Supremus successfully beats Slumbot by an extremely large margin and also achieves a lower exploitability than DeepStack against a local best response. Together, these results show that with our key improvements, deep counterfactual value networks can achieve state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题