用语言抽象改善内在探索

论文标题

用语言抽象改善内在探索

Improving Intrinsic Exploration with Language Abstractions

论文作者

Mu, Jesse, Zhong, Victor, Raileanu, Roberta, Jiang, Minqi, Goodman, Noah, Rocktäschel, Tim, Grefenstette, Edward

论文摘要

当奖励稀疏时，强化学习（RL）代理人特别难以训练。一种常见的解决方案是使用内在的奖励来鼓励代理商探索其环境。但是，最近的内在探索方法通常使用基于州的新颖性措施来奖励低级探索，并且可能不会扩展到需要更多抽象技能的领域。取而代之的是，我们探索自然语言作为突出环境中相关抽象的一般媒介。与以前的工作不同，我们通过直接扩展（和比较）竞争性的内在探索基线来评估语言是否可以改善现有探索方法：Amigo（Campero等，2021）和Noveld（Zhang等人，2021年）。在Minigrid和Minihack环境套件的13个具有挑战性的任务中，这些基于语言的变体的表现优于其非语言形式。

Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse. One common solution is to use intrinsic rewards to encourage agents to explore their environment. However, recent intrinsic exploration methods often use state-based novelty measures which reward low-level exploration and may not scale to domains requiring more abstract skills. Instead, we explore natural language as a general medium for highlighting relevant abstractions in an environment. Unlike previous work, we evaluate whether language can improve over existing exploration methods by directly extending (and comparing to) competitive intrinsic exploration baselines: AMIGo (Campero et al., 2021) and NovelD (Zhang et al., 2021). These language-based variants outperform their non-linguistic forms by 47-85% across 13 challenging tasks from the MiniGrid and MiniHack environment suites.

下载PDF全文

下载文献需遵守相关版权规定

论文标题