了解具有自然语言注释的游戏玩法代理

论文标题

了解具有自然语言注释的游戏玩法代理

Understanding Game-Playing Agents with Natural Language Annotations

论文作者

Tomlin, Nicholas, He, Andre, Klein, Dan

论文摘要

我们提出了一个新的数据集，其中包含GO的10K人宣传游戏，并展示如何将这些自然语言注释用作模型可解释性的工具。鉴于董事会状态及其相关的评论，我们的方法使用线性探测来预测来自Alphago Zero（例如Alphago Zero）的中间状态表示域的特定术语（例如KO，Atari）。我们发现这些游戏概念是在两个不同的策略网络中非试图编码的，一个是通过模仿学习训练的，另一种是通过强化学习培训的。此外，从两个模型的后面层中最容易预测域特异性术语，这表明这些策略网络编码了与自然语言注释中使用的术语相似的高级抽象。

We present a new dataset containing 10K human-annotated games of Go and show how these natural language annotations can be used as a tool for model interpretability. Given a board state and its associated comment, our approach uses linear probing to predict mentions of domain-specific terms (e.g., ko, atari) from the intermediate state representations of game-playing agents like AlphaGo Zero. We find these game concepts are nontrivially encoded in two distinct policy networks, one trained via imitation learning and another trained via reinforcement learning. Furthermore, mentions of domain-specific terms are most easily predicted from the later layers of both models, suggesting that these policy networks encode high-level abstractions similar to those used in the natural language annotations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题