论文标题
致力于使用POMDP的完全可观察的策略
Towards Using Fully Observable Policies for POMDPs
论文作者
论文摘要
部分可观察到的马尔可夫决策过程(POMDP)是适用于许多现实世界问题的框架。在这项工作中,我们提出了一种方法,通过依靠解决完全可观察的版本的策略来解决具有多模式信念的POMDP。通过denininig,基于完全可观察到的变体的值函数的新的混合价值函数,我们可以使用相应的贪婪策略来求解POMDP本身。我们开发了讨论所必需的数学框架,并引入了基于侦察盲tictactoe任务的基准。在此基准上,我们表明我们的策略优于政策,而忽略了多种模式的存在。
Partially Observable Markov Decision Process (POMDP) is a framework applicable to many real world problems. In this work, we propose an approach to solve POMDPs with multimodal belief by relying on a policy that solves the fully observable version. By defininig a new, mixture value function based on the value function from the fully observable variant, we can use the corresponding greedy policy to solve the POMDP itself. We develop the mathematical framework necessary for discussion, and introduce a benchmark built on the task of Reconnaissance Blind TicTacToe. On this benchmark, we show that our policy outperforms policies ignoring the existence of multiple modes.