关于通过基于模型的强化学习的交叉任务转移的可行性

论文标题

关于通过基于模型的强化学习的交叉任务转移的可行性

On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning

论文作者

Xu, Yifan, Hansen, Nicklas, Wang, Zirui, Chan, Yung-Chieh, Su, Hao, Tu, Zhuowen

论文摘要

强化学习（RL）算法可以直接从图像观察中解决挑战性的控制问题，但它们通常需要数百万个环境相互作用才能做到这一点。最近，基于模型的RL算法通过同时学习世界内部模型，并与想象中的推广以改进政策，从而大大提高了样本效率。但是，从头开始学习一个有效的世界模型是具有挑战性的，并且与人类严重依赖世界理解和视觉提示的人类形成鲜明对比。在这项工作中，我们调查了是否可以利用基于现代模型的RL算法学习的内部模型来更快地解决新的，截然不同的任务。我们提出了基于模型的交叉任务转移（XTRA），这是一个用于样本效率的在线RL的框架，具有可扩展的预读和对学到的世界模型的填充。通过离线多任务预处理和在线交叉任务登录，我们比从头开始训练的基线取得了重大改进。我们将基于模型的算法效应的平均性能提高23％，在某些情况下提高了多达71％。

Reinforcement Learning (RL) algorithms can solve challenging control problems directly from image observations, but they often require millions of environment interactions to do so. Recently, model-based RL algorithms have greatly improved sample-efficiency by concurrently learning an internal model of the world, and supplementing real environment interactions with imagined rollouts for policy improvement. However, learning an effective model of the world from scratch is challenging, and in stark contrast to humans that rely heavily on world understanding and visual cues for learning new skills. In this work, we investigate whether internal models learned by modern model-based RL algorithms can be leveraged to solve new, distinctly different tasks faster. We propose Model-Based Cross-Task Transfer (XTRA), a framework for sample-efficient online RL with scalable pretraining and finetuning of learned world models. By offline multi-task pretraining and online cross-task finetuning, we achieve substantial improvements over a baseline trained from scratch; we improve mean performance of model-based algorithm EfficientZero by 23%, and by as much as 71% in some instances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题