论文标题
后核构建具有基于模型的强化学习的内核化Stein差异
Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning
论文作者
论文摘要
基于模型的增强学习方法(MBRL)在实践中表现出有利的表现,但是当过渡模型是高斯或Lipschitz时,它们在大空间中的理论保证主要仅限于设置,并且需要后验估计,其表示复杂性与时间无关。在这项工作中,我们开发了一种新型的MBRL方法(i),它放宽了目标过渡模型的假设属于一般的混合模型家族。 (ii)通过合并压缩步骤,适用于大规模培训,以使后估计由仅具有统计学意义的过去国家行动对的贝叶斯核心组成; (iii)表现出均匀的贝叶斯遗憾。为了实现这些结果,我们采用了一种基于Stein方法的方法,Stein的方法在构造的后部和目标上的平滑度条件下,可以以封闭形式评估分布距离作为内核化的Stein差异(KSD)。然后,根据贪婪地保留那些远离先前模型估计值的KSD的样品,计算上述压缩步骤。在实验上,我们观察到,这种方法与几种最先进的RL方法具有竞争力,并且可以在某些连续的控制环境中最多减少壁时钟时间。
Model-based approaches to reinforcement learning (MBRL) exhibit favorable performance in practice, but their theoretical guarantees in large spaces are mostly restricted to the setting when transition model is Gaussian or Lipschitz, and demands a posterior estimate whose representational complexity grows unbounded with time. In this work, we develop a novel MBRL method (i) which relaxes the assumptions on the target transition model to belong to a generic family of mixture models; (ii) is applicable to large-scale training by incorporating a compression step such that the posterior estimate consists of a Bayesian coreset of only statistically significant past state-action pairs; and (iii) exhibits a sublinear Bayesian regret. To achieve these results, we adopt an approach based upon Stein's method, which, under a smoothness condition on the constructed posterior and target, allows distributional distance to be evaluated in closed form as the kernelized Stein discrepancy (KSD). The aforementioned compression step is then computed in terms of greedily retaining only those samples which are more than a certain KSD away from the previous model estimate. Experimentally, we observe that this approach is competitive with several state-of-the-art RL methodologies, and can achieve up-to 50 percent reduction in wall clock time in some continuous control environments.