论文标题
内在动机探索的潜在世界模型
Latent World Models For Intrinsically Motivated Exploration
论文作者
论文摘要
在这项工作中,我们考虑了具有稀疏奖励的部分可观察到的环境。我们为基于图像的观测值提供了一种自我监督的表示学习方法,该方法安排了符合观察时间时间距离的嵌入。在经验上,这种表示对随机性是可靠的,适合从预测前向模型的误差中发现新颖性。我们认为偶发性和终身不确定性来指导探索。我们建议通过在博学的潜在空间中运行的世界模型来估算有关环境的缺失信息。作为该方法的动机,我们分析了表格的部分可观察到的迷宫中的勘探问题。我们演示了来自Atari基准测试的基于图像的硬探索环境的方法,并报告了先前工作的显着改善。该方法的源代码和所有实验均可在https://github.com/htdt/lwm上获得。
In this work we consider partially observable environments with sparse rewards. We present a self-supervised representation learning method for image-based observations, which arranges embeddings respecting temporal distance of observations. This representation is empirically robust to stochasticity and suitable for novelty detection from the error of a predictive forward model. We consider episodic and life-long uncertainties to guide the exploration. We propose to estimate the missing information about the environment with the world model, which operates in the learned latent space. As a motivation of the method, we analyse the exploration problem in a tabular Partially Observable Labyrinth. We demonstrate the method on image-based hard exploration environments from the Atari benchmark and report significant improvement with respect to prior work. The source code of the method and all the experiments is available at https://github.com/htdt/lwm.