大型游戏中的有效偏差类型和学习的事后理性：更正

论文标题

大型游戏中的有效偏差类型和学习的事后理性：更正

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games: Corrections

论文作者

Morrill, Dustin, D'Orazio, Ryan, Lanctot, Marc, Wright, James R., Bowling, Michael, Greenwald, Amy R.

论文摘要

事后观察合理性是一种玩一通用游戏的方法，该游戏规定了针对一组偏差的单个代理的无重格学习动态，并进一步描述了具有介导的平衡的多个代理商之间的共同理性行为。为了在依次的决策设置中发展事后理性学习，我们将行为偏差形式化为尊重广泛形式游戏结构的一般偏差类别。将时间选择的概念整合到反事实遗憾的最小化（CFR）中，我们介绍了广泛形式的遗憾最小化（EFR）算法，该算法对于任何给定的一组行为偏差，都可以通过与集合的复杂性紧密扩展到相关的计算方面实现事后的理性。我们确定行为偏差子集，部分序列偏差类型，这些类型还包含以前研究的类型并导致长度适中的游戏中有效的EFR实例。此外，我们对基准游戏中不同偏差类型实例化的EFR进行了彻底的经验分析，我们发现更强大的类型通常会引起更好的性能。

Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria. To develop hindsight rational learning in sequential decision-making settings, we formalize behavioral deviations as a general class of deviations that respect the structure of extensive-form games. Integrating the idea of time selection into counterfactual regret minimization (CFR), we introduce the extensive-form regret minimization (EFR) algorithm that achieves hindsight rationality for any given set of behavioral deviations with computation that scales closely with the complexity of the set. We identify behavioral deviation subsets, the partial sequence deviation types, that subsume previously studied types and lead to efficient EFR instances in games with moderate lengths. In addition, we present a thorough empirical analysis of EFR instantiated with different deviation types in benchmark games, where we find that stronger types typically induce better performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题