学习具有深度变异推理的离散状态抽象

论文标题

学习具有深度变异推理的离散状态抽象

Learning Discrete State Abstractions With Deep Variational Inference

论文作者

Biza, Ondrej, Platt, Robert, van de Meent, Jan-Willem, Wong, Lawson L. S.

论文摘要

抽象对于在具有较大状态空间的域中有效的顺序决策至关重要。在这项工作中，我们提出了一种信息瓶颈方法，用于学习近似双象征，即一种状态抽象。我们使用深层神经编码器将状态映射到连续嵌入。我们使用动作条件的隐藏马尔可夫模型将这些嵌入到离散表示上，该模型是通过神经网络端到端训练的。我们的方法适合具有高维状态的环境，并从在马尔可夫决策过程中的代理商收集的经验流来学习。通过这个学到的离散抽象模型，我们可以在多进球的强化学习环境中有效地计划看不见的目标。我们在具有图像态的简化机器人操纵域中测试我们的方法。我们还将其与以前基于模型的方法进行比较，以在离散的网格世界样环境中找到双象征。源代码可从https://github.com/ondrejba/discrete_abstractions获得。

Abstraction is crucial for effective sequential decision making in domains with large state spaces. In this work, we propose an information bottleneck method for learning approximate bisimulations, a type of state abstraction. We use a deep neural encoder to map states onto continuous embeddings. We map these embeddings onto a discrete representation using an action-conditioned hidden Markov model, which is trained end-to-end with the neural network. Our method is suited for environments with high-dimensional states and learns from a stream of experience collected by an agent acting in a Markov decision process. Through this learned discrete abstract model, we can efficiently plan for unseen goals in a multi-goal Reinforcement Learning setting. We test our method in simplified robotic manipulation domains with image states. We also compare it against previous model-based approaches to finding bisimulations in discrete grid-world-like environments. Source code is available at https://github.com/ondrejba/discrete_abstractions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题