马尔可夫决策过程优化公式的注释

论文标题

马尔可夫决策过程优化公式的注释

A Note on Optimization Formulations of Markov Decision Processes

论文作者

Ying, Lexing, Zhu, Yuhua

论文摘要

本说明总结了马尔可夫决策过程研究中使用的优化公式。我们考虑标准和熵登记的设置下的折扣和未交望的过程。对于每种设置，我们首先总结了线性编程公式的原始，双重和原始偶的问题。然后，我们详细介绍了这些问题与马尔可夫决策过程（例如Bellman方程和策略梯度方法）之间的连接。

This note summarizes the optimization formulations used in the study of Markov decision processes. We consider both the discounted and undiscounted processes under the standard and the entropy-regularized settings. For each setting, we first summarize the primal, dual, and primal-dual problems of the linear programming formulation. We then detail the connections between these problems and other formulations for Markov decision processes such as the Bellman equation and the policy gradient method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题