部分可观察到的马尔可夫决策过程的结构估计

论文标题

部分可观察到的马尔可夫决策过程的结构估计

Structural Estimation of Partially Observable Markov Decision Processes

论文作者

Chang, Yanling, Garcia, Alfredo, Wang, Zhide, Sun, Lu

论文摘要

在许多实际设置中，必须根据有关相关状态变量演变的部分/不完美的信息做出控制决策。部分可观察到的马尔可夫决策过程（POMDP）是一个相对发达的框架，用于建模和分析此类问题。在本文中，我们考虑了基于该过程的可观察历史的POMDP模型基原料的结构估计。我们分析了具有随机奖励的POMDP模型的结构特性，并指定了该模型在没有状态动力学的情况下可识别的条件。我们考虑一种软策略梯度算法来计算最大似然估计器，并提供对固定点收敛的有限时间表征。我们通过应用最佳设备更换的应用说明了估计方法。在这种情况下，必须根据有关真实状态的部分/不完美信息（即设备状况）做出替换决策。我们使用合成和真实数据来突出提出的方法的鲁棒性，并在忽略部分状态可观察性时表征错误指定的可能性。

In many practical settings control decisions must be made under partial/imperfect information about the evolution of a relevant state variable. Partially Observable Markov Decision Processes (POMDPs) is a relatively well-developed framework for modeling and analyzing such problems. In this paper we consider the structural estimation of the primitives of a POMDP model based upon the observable history of the process. We analyze the structural properties of POMDP model with random rewards and specify conditions under which the model is identifiable without knowledge of the state dynamics. We consider a soft policy gradient algorithm to compute a maximum likelihood estimator and provide a finite-time characterization of convergence to a stationary point. We illustrate the estimation methodology with an application to optimal equipment replacement. In this context, replacement decisions must be made under partial/imperfect information on the true state (i.e. condition of the equipment). We use synthetic and real data to highlight the robustness of the proposed methodology and characterize the potential for misspecification when partial state observability is ignored.

下载PDF全文

下载文献需遵守相关版权规定

论文标题