了解您的界限：在离线RL中明确行为克隆的必要性

论文标题

了解您的界限：在离线RL中明确行为克隆的必要性

Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL

论文作者

Goo, Wonjoon, Niekum, Scott

论文摘要

我们介绍了一个离线增强学习（RL）算法，该算法明确地克隆行为政策以限制价值学习。在离线RL中，通常重要的是要防止政策选择未观察到的动作，因为如果没有有关环境的其他信息，就无法假定这些行动的结果。实施这种约束的一种直接方法是通过行为克隆明确对给定的数据分布进行建模，并直接强迫策略不选择不确定的操作。但是，由于对潜在的复杂行为策略进行建模时的关注，许多离线RL方法是间接实现了约束的间接（例如，悲观的价值估计）。在这项工作中，我们认为，明确对离线RL的行为策略进行建模不仅是可行的，而且有益，因为可以通过训练有素的模型以稳定的方式实现约束。我们首先提出了一个理论框架，使我们能够将被行为的模型纳入基于价值的离线RL方法中，从而享受明确的行为克隆和价值学习的强度。然后，我们提出了一种利用基于得分的生成模型进行行为克隆的实用方法。通过提出的方法，我们在D4RL和Robomimic Benchmarks中的几个数据集上显示了最先进的性能，并在所有测试的数据集中实现了竞争性能。

We introduce an offline reinforcement learning (RL) algorithm that explicitly clones a behavior policy to constrain value learning. In offline RL, it is often important to prevent a policy from selecting unobserved actions, since the consequence of these actions cannot be presumed without additional information about the environment. One straightforward way to implement such a constraint is to explicitly model a given data distribution via behavior cloning and directly force a policy not to select uncertain actions. However, many offline RL methods instantiate the constraint indirectly -- for example, pessimistic value estimation -- due to a concern about errors when modeling a potentially complex behavior policy. In this work, we argue that it is not only viable but beneficial to explicitly model the behavior policy for offline RL because the constraint can be realized in a stable way with the trained model. We first suggest a theoretical framework that allows us to incorporate behavior-cloned models into value-based offline RL methods, enjoying the strength of both explicit behavior cloning and value learning. Then, we propose a practical method utilizing a score-based generative model for behavior cloning. With the proposed method, we show state-of-the-art performance on several datasets within the D4RL and Robomimic benchmarks and achieve competitive performance across all datasets tested.

下载PDF全文

下载文献需遵守相关版权规定

论文标题