论文标题

自举学习

Model-free Neural Counterfactual Regret Minimization with Bootstrap Learning

论文作者

Liu, Weiming, Li, Bin, Togelius, Julian

论文摘要

反事实遗憾的最小化(CFR)在解决大规模不完美的信息游戏(IIG)方面取得了许多令人着迷的结果。神经网络近似CFR(神经CFR)是一种有希望的技术之一,可以通过在相似状态之间概括决策信息来减少计算和记忆消耗。当前的神经CFR算法必须近似累积遗憾。但是,大规模IIG中有效且准确的近似仍然是一个艰巨的挑战。在本文中,提出了一种新的CFR变体递归CFR(RECFR)。在RECFR中,学习并用于替代累积后悔。事实证明,RECFR可以以$ O({1}/{\ sqrt {t}}} $的速率收敛到NASH平衡。基于RECFR,提出了一种带有引导性的新型神经CFR,即神经RECFR-B。由于RSV的递归和非肿瘤性质,与其他神经CFR相比,神经RECFR-B具有低变化的训练目标。实验结果表明,神经RECFR-B与最新的神经CFR算法具有竞争力,培训成本要低得多。

Counterfactual Regret Minimization (CFR) has achieved many fascinating results in solving large-scale Imperfect Information Games (IIGs). Neural network approximation CFR (neural CFR) is one of the promising techniques that can reduce computation and memory consumption by generalizing decision information between similar states. Current neural CFR algorithms have to approximate cumulative regrets. However, efficient and accurate approximation in a large-scale IIG is still a tough challenge. In this paper, a new CFR variant, Recursive CFR (ReCFR), is proposed. In ReCFR, Recursive Substitute Values (RSVs) are learned and used to replace cumulative regrets. It is proven that ReCFR can converge to a Nash equilibrium at a rate of $O({1}/{\sqrt{T}})$. Based on ReCFR, a new model-free neural CFR with bootstrap learning, Neural ReCFR-B, is proposed. Due to the recursive and non-cumulative nature of RSVs, Neural ReCFR-B has lower-variance training targets than other neural CFRs. Experimental results show that Neural ReCFR-B is competitive with the state-of-the-art neural CFR algorithms at a much lower training cost.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源