可逆的马尔可夫决策过程和高斯自由领域

论文标题

可逆的马尔可夫决策过程和高斯自由领域

Reversible Markov decision processes and the Gaussian free field

论文作者

Anantharam, Venkat

论文摘要

如果在每种固定的马尔可夫策略下，固定的受控马尔可夫链都是可逆的，那么马尔可夫的决策问题被称为可逆。出现此类问题的一种自然应用是控制大都会危机类型动力学。我们以有限的状态和动作空间来表征所有离散时间可逆的马尔可夫决策过程。我们表明，可以显着简化此类型的马尔可夫决策问题的政策迭代算法。我们还强调了奖励应计的有限时间演变与与受控马尔可夫链相关的高斯自由场之间的关系。

A Markov decision problem is called reversible if the stationary controlled Markov chain is reversible under every stationary Markovian strategy. A natural application in which such problems arise is in the control of Metropolis-Hastings type dynamics. We characterize all discrete time reversible Markov decision processes with finite state and actions spaces. We show that policy iteration algorithm for finding an optimal policy can be significantly simplified Markov decision problems of this type. We also highlight the relation between the finite time evolution of the accrual of reward and the Gaussian free field associated to the controlled Markov chain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题