论文标题
Avalon:使用程序生成世界的RL概括的基准
Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds
论文作者
论文摘要
尽管取得了令人印象深刻的成功,但深厚的强化学习(RL)系统仍未在对与培训不同的新任务和环境中的概括方面的表现不足。作为针对研究RL概括的基准测试,我们介绍了Avalon,这是一套任务,其中一系列的任务必须通过浏览地形,狩猎或收集食物并避免危险来生存高度多样化的3D世界中的体现代理。 Avalon在现有的RL基准测试中是独一无二的,因为每个任务的奖励功能,世界动态和动作空间都是相同的,而任务仅通过改变环境而差异化。它的20个任务,从饮食到狩猎和导航的复杂性不等,每个世界都创造了代理商必须执行特定技能才能生存的世界。该设置可以调查任务,任务之间以及需要结合从先前任务中学到的技能的组成任务的概括。阿瓦隆(Avalon)包括一个高效的模拟器,一个基线库以及一个针对数百小时的人类绩效评估的基准测量标准,所有这些绩效都是开源的且公开的。我们发现,标准RL基准在大多数任务上取得了进步,但仍远离人类绩效,这表明Avalon充满挑战,足以促进对可推广的RL的追求。
Despite impressive successes, deep reinforcement learning (RL) systems still fall short of human performance on generalization to new tasks and environments that differ from their training. As a benchmark tailored for studying RL generalization, we introduce Avalon, a set of tasks in which embodied agents in highly diverse procedural 3D worlds must survive by navigating terrain, hunting or gathering food, and avoiding hazards. Avalon is unique among existing RL benchmarks in that the reward function, world dynamics, and action space are the same for every task, with tasks differentiated solely by altering the environment; its 20 tasks, ranging in complexity from eat and throw to hunt and navigate, each create worlds in which the agent must perform specific skills in order to survive. This setup enables investigations of generalization within tasks, between tasks, and to compositional tasks that require combining skills learned from previous tasks. Avalon includes a highly efficient simulator, a library of baselines, and a benchmark with scoring metrics evaluated against hundreds of hours of human performance, all of which are open-source and publicly available. We find that standard RL baselines make progress on most tasks but are still far from human performance, suggesting Avalon is challenging enough to advance the quest for generalizable RL.