块MDP的不变因果预测

论文标题

块MDP的不变因果预测

Invariant Causal Prediction for Block MDPs

论文作者

Zhang, Amy, Lyle, Clare, Sodhani, Shagun, Filos, Angelos, Kwiatkowska, Marta, Pineau, Joelle, Gal, Yarin, Precup, Doina

论文摘要

跨环境的概括对于成功应用强化学习算法到现实世界中的挑战至关重要。在本文中，我们考虑了学习抽象的问题，这些问题在块MDP，具有共享潜在状态空间和动态结构的环境家庭中概括了，但观察到了不同的观察。我们利用因果推断的工具来提出一种不变预测的方法，以学习模型 - 呈现状态抽象（MISA），该模型在多环境环境中概括了新的观察结果。我们证明，对于某些类别的环境，此方法具有高概率的输出，状态抽象与收益相对于因果特征对应的状态抽象。在多种环境设置中，我们进一步为模型误差和概括误差提供了更一般的界限，在显示因果变量选择与MDP的状态抽象框架之间的联系。我们提供了经验证据，表明我们的方法在线性和非线性设置中都起作用，从而改善了对单个和多任务基准的概括。

Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but varying observations. We leverage tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting. We prove that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return. We further provide more general bounds on model error and generalization error in the multi-environment setting, in the process showing a connection between causal variable selection and the state abstraction framework for MDPs. We give empirical evidence that our methods work in both linear and nonlinear settings, attaining improved generalization over single- and multi-task baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题