论文标题
离线加强学习动手
Offline Reinforcement Learning Hands-On
论文作者
论文摘要
离线增强学习(RL)旨在将大型数据集变成强大的决策引擎,而无需与环境进行任何在线互动。这一巨大的希望促使大量研究希望复制RL在模拟环境中获得的成功。从从业者的角度来看,这项工作的野心是为了反思这些努力。我们首先讨论我们假设的数据集属性可以表征最成功的离线方法的类型。然后,我们通过一组实验验证这些主张,并设计了从具有离散和连续动作空间的环境生成的数据集。我们在实验上验证了数据中的多样性和高回报示例对于离线RL的成功至关重要,并表明与当代人相比,行为克隆仍然是强大的竞争者。总体而言,这项工作是一个教程,可以帮助人们建立对当今离线RL方法及其适用性的直觉。
Offline Reinforcement Learning (RL) aims to turn large datasets into powerful decision-making engines without any online interactions with the environment. This great promise has motivated a large amount of research that hopes to replicate the success RL has experienced in simulation settings. This work ambitions to reflect upon these efforts from a practitioner viewpoint. We start by discussing the dataset properties that we hypothesise can characterise the type of offline methods that will be the most successful. We then verify these claims through a set of experiments and designed datasets generated from environments with both discrete and continuous action spaces. We experimentally validate that diversity and high-return examples in the data are crucial to the success of offline RL and show that behavioural cloning remains a strong contender compared to its contemporaries. Overall, this work stands as a tutorial to help people build their intuition on today's offline RL methods and their applicability.