通过强化学习中的语言模型压缩历史

论文标题

通过强化学习中的语言模型压缩历史

History Compression via Language Models in Reinforcement Learning

论文作者

Paischer, Fabian, Adler, Thomas, Patil, Vihang, Bitto-Nemling, Angela, Holzleitner, Markus, Lehner, Sebastian, Eghbal-zadeh, Hamid, Hochreiter, Sepp

论文摘要

在部分可观察到的马尔可夫决策过程（POMDP）中，代理通常使用过去的表示来近似基础MDP。我们建议利用冷冻验证的语言变压器（PLT）进行病史表示和压缩，以提高样品效率。为了避免对变压器进行训练，我们引入了Frozenhopfield，该菲尔德自动将观察结果与预处理的令牌嵌入相关联。为了形成这些关联，现代的Hopfield网络存储了这些令牌嵌入，这些嵌入是通过查询获得的查询来检索的，这些嵌入方式是通过随机但固定的观察值投影获得的。我们的新方法Helm，启用了Actor-Critic网络体系结构，该架构包含用于历史记录表示形式的语言变压器，作为内存模块。由于不需要学习过去的代表，因此掌舵比竞争对手要高得多。在Miligrid和Procgen环境上，Helm掌舵取得了新的最新结果。我们的代码可在https://github.com/ml-jku/helm上找到。

In a partially observable Markov decision process (POMDP), an agent typically uses a representation of the past to approximate the underlying MDP. We propose to utilize a frozen Pretrained Language Transformer (PLT) for history representation and compression to improve sample efficiency. To avoid training of the Transformer, we introduce FrozenHopfield, which automatically associates observations with pretrained token embeddings. To form these associations, a modern Hopfield network stores these token embeddings, which are retrieved by queries that are obtained by a random but fixed projection of observations. Our new method, HELM, enables actor-critic network architectures that contain a pretrained language Transformer for history representation as a memory module. Since a representation of the past need not be learned, HELM is much more sample efficient than competitors. On Minigrid and Procgen environments HELM achieves new state-of-the-art results. Our code is available at https://github.com/ml-jku/helm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题