论文标题
马尔可夫决策过程优化公式的注释
A Note on Optimization Formulations of Markov Decision Processes
论文作者
论文摘要
本说明总结了马尔可夫决策过程研究中使用的优化公式。我们考虑标准和熵登记的设置下的折扣和未交望的过程。对于每种设置,我们首先总结了线性编程公式的原始,双重和原始偶的问题。然后,我们详细介绍了这些问题与马尔可夫决策过程(例如Bellman方程和策略梯度方法)之间的连接。
This note summarizes the optimization formulations used in the study of Markov decision processes. We consider both the discounted and undiscounted processes under the standard and the entropy-regularized settings. For each setting, we first summarize the primal, dual, and primal-dual problems of the linear programming formulation. We then detail the connections between these problems and other formulations for Markov decision processes such as the Bellman equation and the policy gradient method.