Transdreamer：使用变压器世界模型的增强学习

论文标题

Transdreamer：使用变压器世界模型的增强学习

TransDreamer: Reinforcement Learning with Transformer World Models

论文作者

Chen, Chang, Wu, Yi-Fu, Yoon, Jaesik, Ahn, Sungjin

论文摘要

Dreamer代理提供了基于模型的增强学习（MBRL）的各种好处，例如样本效率，可重复使用的知识和安全计划。但是，它的世界模型和政策网络继承了复发性神经网络的局限性，因此一个重要的问题是，MBR框架如何从变形金刚的最新进展以及这样做的挑战中受益。在本文中，我们提出了一种基于变压器的MBRL代理，称为TransDreamer。我们首先介绍变压器状态空间模型，该模型利用变压器进行动力学预测。然后，我们与基于变压器的策略网络共享这个世界模型，并在培训基于变压器的RL代理方面获得稳定性。在实验中，我们将提出的模型应用于2D Visual RL和3D第一人称视觉RL任务，既需要用于基于内存的推理的长距离内存访问。我们表明，在这些复杂的任务中，提出的模型优于梦想家。

The Dreamer agent provides various benefits of Model-Based Reinforcement Learning (MBRL) such as sample efficiency, reusable knowledge, and safe planning. However, its world model and policy networks inherit the limitations of recurrent neural networks and thus an important question is how an MBRL framework can benefit from the recent advances of transformers and what the challenges are in doing so. In this paper, we propose a transformer-based MBRL agent, called TransDreamer. We first introduce the Transformer State-Space Model, a world model that leverages a transformer for dynamics predictions. We then share this world model with a transformer-based policy network and obtain stability in training a transformer-based RL agent. In experiments, we apply the proposed model to 2D visual RL and 3D first-person visual RL tasks both requiring long-range memory access for memory-based reasoning. We show that the proposed model outperforms Dreamer in these complex tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题