关于在不同方式中变压器预训练对离线增强学习的影响

论文标题

关于在不同方式中变压器预训练对离线增强学习的影响

On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning

论文作者

Takagi, Shiro

论文摘要

我们从经验上研究了不同模式的数据（例如语言和愿景）的预训练如何影响基于变压器的模型的微调来介绍Mujoco离线增强学习任务。对内部表示形式的分析表明，预训练的变压器在预训练之前和之后获得了很大不同的表示，但比随机初始初始化的变压器在微调中获得的数据信息更少。仔细观察预训练的变压器的参数变化表明，它们的参数不会发生太大变化，并且使用图像数据预先训练的模型的不良性能可能部分来自大型梯度和梯度剪辑。为了研究使用语言数据预先训练的变压器使用的信息，我们在没有提供上下文的情况下微调了该模型，发现该模型即使没有上下文信息也可以有效地学习。随后的后续分析支持以下假设：使用语言数据的预训练可能会使变压器获得上下文式信息并利用它来解决下游任务。

We empirically investigate how pre-training on data of different modalities, such as language and vision, affects fine-tuning of Transformer-based models to Mujoco offline reinforcement learning tasks. Analysis of the internal representation reveals that the pre-trained Transformers acquire largely different representations before and after pre-training, but acquire less information of data in fine-tuning than the randomly initialized one. A closer look at the parameter changes of the pre-trained Transformers reveals that their parameters do not change that much and that the bad performance of the model pre-trained with image data could partially come from large gradients and gradient clipping. To study what information the Transformer pre-trained with language data utilizes, we fine-tune this model with no context provided, finding that the model learns efficiently even without context information. Subsequent follow-up analysis supports the hypothesis that pre-training with language data is likely to make the Transformer get context-like information and utilize it to solve the downstream task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题