在信任它之前，请仔细检查您的状态：基于信心的双向离线模型的想象力

论文标题

在信任它之前，请仔细检查您的状态：基于信心的双向离线模型的想象力

Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination

论文作者

Lyu, Jiafei, Li, Xiu, Lu, Zongqing

论文摘要

熟悉的无模型离线增强学习（RL）方法的政策通常受到限制，以保持在数据集的支持范围内，以避免可能危险的危险分布措施或状态，从而使处理供应到支撑型区域的挑战。基于模型的RL方法通过使用经过训练的前进或反向动态模型生成虚构轨迹来提供更丰富的数据集和益处概括。但是，想象的过渡可能不准确，因此降低了基础离线RL方法的性能。在本文中，我们建议通过使用训练有素的双向动力学模型和通过双检查进行推出策略来增强离线数据集。我们通过信任前向模型和落后模型一致的样本来介绍保守主义。我们的方法是基于置信度的双向离线模型的想象力，可以生成可靠的样本，并可以与任何无模型的离线RL方法结合使用。 D4RL基准测试的实验结果表明，我们的方法显着提高了现有的无模型离线RL算法的性能，并在基线方法上取得了竞争性或更好的分数。

The learned policy of model-free offline reinforcement learning (RL) methods is often constrained to stay within the support of datasets to avoid possible dangerous out-of-distribution actions or states, making it challenging to handle out-of-support region. Model-based RL methods offer a richer dataset and benefit generalization by generating imaginary trajectories with either trained forward or reverse dynamics model. However, the imagined transitions may be inaccurate, thus downgrading the performance of the underlying offline RL method. In this paper, we propose to augment the offline dataset by using trained bidirectional dynamics models and rollout policies with double check. We introduce conservatism by trusting samples that the forward model and backward model agree on. Our method, confidence-aware bidirectional offline model-based imagination, generates reliable samples and can be combined with any model-free offline RL method. Experimental results on the D4RL benchmarks demonstrate that our method significantly boosts the performance of existing model-free offline RL algorithms and achieves competitive or better scores against baseline methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题