当数据几何符合深度功能时：概括离线增强学习

论文标题

当数据几何符合深度功能时：概括离线增强学习

When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement Learning

论文作者

Li, Jianxiong, Zhan, Xianyuan, Xu, Haoran, Zhu, Xiangyu, Liu, Jingjing, Zhang, Ya-Qin

论文摘要

在离线增强学习（RL）中，政策学习的一个有害问题是Q函数在分布（OOD）领域中的错误积累。不幸的是，现有的离线RL方法通常是过分保守的，不可避免地会伤害数据分布之外的概括性能。在我们的研究中，一个有趣的观察结果是，深Q功能在训练数据的凸面内部近似。受此启发，我们提出了一种新方法，Doge（距离敏感的离线RL具有更好的概括）。 Doge将数据集几何形状与离线RL中的深度功能近似器相结合，并在可概括的OOD区域中启用剥削，而不是严格地限制数据分布中的策略。具体而言，Doge会训练一个国家条件的距离函数，该功能可以很容易地插入标准的参与者批判性方法作为策略约束。与D4RL基准测试的最先进方法相比，我们的算法简单而优雅，具有更好的概括。理论分析证明了我们的方法对仅基于数据分布或支持约束的现有方法的优越性。

In offline reinforcement learning (RL), one detrimental issue to policy learning is the error accumulation of deep Q function in out-of-distribution (OOD) areas. Unfortunately, existing offline RL methods are often over-conservative, inevitably hurting generalization performance outside data distribution. In our study, one interesting observation is that deep Q functions approximate well inside the convex hull of training data. Inspired by this, we propose a new method, DOGE (Distance-sensitive Offline RL with better GEneralization). DOGE marries dataset geometry with deep function approximators in offline RL, and enables exploitation in generalizable OOD areas rather than strictly constraining policy within data distribution. Specifically, DOGE trains a state-conditioned distance function that can be readily plugged into standard actor-critic methods as a policy constraint. Simple yet elegant, our algorithm enjoys better generalization compared to state-of-the-art methods on D4RL benchmarks. Theoretical analysis demonstrates the superiority of our approach to existing methods that are solely based on data distribution or support constraints.

下载PDF全文

下载文献需遵守相关版权规定

论文标题