Neupl：神经学习

论文标题

Neupl：神经学习

NeuPL: Neural Population Learning

论文作者

Liu, Siqi, Marris, Luke, Hennes, Daniel, Merel, Josh, Heess, Nicolas, Graepel, Thore

论文摘要

战略游戏中的学习（例如，星际争霸，扑克）需要发现各种政策。这通常是通过迭代培训针对现有法案的新政策来实现的，增加了一个强大的利用政策人群。这种迭代方法在现实世界中有两个问题：a）在有限的预算下，每次迭代需要截断的最佳响应运营商都需要截断，从而导致训练不足的好响应者在人群中占据了人口的群体； b）在每次迭代中重复学习基本技能是浪费的，并且在越来越强大的对手的情况下变得棘手。在这项工作中，我们建议神经学习（NEUPL）作为解决这两个问题的解决方案。 Neupl在轻度假设下提供了一系列最佳响应者的融合保证。通过代表单个条件模型中的一系列政策，Neupl可以跨政策进行转移学习。从经验上讲，我们在几个测试域中显示了NEUPL的通用性，提高性能和效率。最有趣的是，我们表明，随着神经人群的扩大，新颖的策略变得更加易于使用，而不是更少。

Learning in strategy games (e.g. StarCraft, poker) requires the discovery of diverse policies. This is often achieved by iteratively training new policies against existing ones, growing a policy population that is robust to exploit. This iterative approach suffers from two issues in real-world games: a) under finite budget, approximate best-response operators at each iteration needs truncating, resulting in under-trained good-responses populating the population; b) repeated learning of basic skills at each iteration is wasteful and becomes intractable in the presence of increasingly strong opponents. In this work, we propose Neural Population Learning (NeuPL) as a solution to both issues. NeuPL offers convergence guarantees to a population of best-responses under mild assumptions. By representing a population of policies within a single conditional model, NeuPL enables transfer learning across policies. Empirically, we show the generality, improved performance and efficiency of NeuPL across several test domains. Most interestingly, we show that novel strategies become more accessible, not less, as the neural population expands.

下载PDF全文

下载文献需遵守相关版权规定

论文标题