NPC行为的深层政策网络，这些网络适应了Roguelike游戏中不断变化的设计参数

论文标题

NPC行为的深层政策网络，这些网络适应了Roguelike游戏中不断变化的设计参数

Deep Policy Networks for NPC Behaviors that Adapt to Changing Design Parameters in Roguelike Games

论文作者

Sestini, Alessandro, Kuhnle, Alexander, Bagdanov, Andrew D.

论文摘要

深度强化学习（DRL）的最新进展主要集中在改善代理的性能上，目的是在已知和定义明确的环境中取代人类。将这些技术用作视频游戏制作的游戏设计工具，而目的是创建非玩家角色（NPC）行为，直到最近才受到相对较少的关注。例如，基于转弯的策略游戏，例如Roguelikes，对DRL提出了独特的挑战。特别是，其复杂的游戏状态的分类性质，由许多具有不同属性的实体组成，需要代理能够学习如何比较和优先考虑这些实体。此外，这种复杂性通常会导致代理商过度适合训练期间所见的州，并且在开发过程中所做的设计变化时无法概括。在本文中，我们提出了两个网络体系结构，当与\ emph {Progence loot Instragen}系统结合使用时，能够更好地处理复杂的分类状态空间，并减轻设计决策强迫的重新训练的需求。首先是基于分类输入空间的密集嵌入，该空间将离散的观察模型抽象，并使训练有素的代理更有能力概括。第二个提出的体系结构更加笼统，并且基于能够在输入和输入属性的关系上推理的变压器网络。我们的实验评估表明，对于基线体系结构，新代理具有更好的适应能力，这使得该框架在开发过程中更适合动态游戏变化。根据本文所示的结果，我们认为这些解决方案代表了使游戏行业更容易访问DRL的一步。

Recent advances in Deep Reinforcement Learning (DRL) have largely focused on improving the performance of agents with the aim of replacing humans in known and well-defined environments. The use of these techniques as a game design tool for video game production, where the aim is instead to create Non-Player Character (NPC) behaviors, has received relatively little attention until recently. Turn-based strategy games like Roguelikes, for example, present unique challenges to DRL. In particular, the categorical nature of their complex game state, composed of many entities with different attributes, requires agents able to learn how to compare and prioritize these entities. Moreover, this complexity often leads to agents that overfit to states seen during training and that are unable to generalize in the face of design changes made during development. In this paper we propose two network architectures which, when combined with a \emph{procedural loot generation} system, are able to better handle complex categorical state spaces and to mitigate the need for retraining forced by design decisions. The first is based on a dense embedding of the categorical input space that abstracts the discrete observation model and renders trained agents more able to generalize. The second proposed architecture is more general and is based on a Transformer network able to reason relationally about input and input attributes. Our experimental evaluation demonstrates that new agents have better adaptation capacity with respect to a baseline architecture, making this framework more robust to dynamic gameplay changes during development. Based on the results shown in this paper, we believe that these solutions represent a step forward towards making DRL more accessible to the gaming industry.

下载PDF全文

下载文献需遵守相关版权规定

论文标题