论文标题
技术报告:策略图改进算法
Technical Report: The Policy Graph Improvement Algorithm
论文作者
论文摘要
优化部分可观察到的马尔可夫决策过程(POMDP)策略具有挑战性。 POMDPS的策略图改进(PGI)算法将策略表示为固定尺寸策略图,并单调地改进策略。由于固定的策略规模,每次改进迭代的计算时间都是事先知道的。此外,该方法允许紧凑可理解的策略。本报告比[1]或[1]或[2]更易于访问的方式描述了PGI [1]和基于粒子的PGI [2]算法,允许从业者和学生理解和实施算法。
Optimizing a partially observable Markov decision process (POMDP) policy is challenging. The policy graph improvement (PGI) algorithm for POMDPs represents the policy as a fixed size policy graph and improves the policy monotonically. Due to the fixed policy size, computation time for each improvement iteration is known in advance. Moreover, the method allows for compact understandable policies. This report describes the technical details of the PGI [1] and particle based PGI [2] algorithms for POMDPs in a more accessible way than [1] or [2] allowing practitioners and students to understand and implement the algorithms.