马尔可夫政策的足够持续时间跳跃马尔可夫决策过程

论文标题

马尔可夫政策的足够持续时间跳跃马尔可夫决策过程

Sufficiency of Markov Policies for Continuous-Time Jump Markov Decision Processes

论文作者

Feinberg, Eugene A., Mandava, Manasa, Shiryaev, Albert N.

论文摘要

本文扩展到连续的跳跃马尔可夫决策过程（CTJMDP），马尔可夫决策过程的经典结果表明，对于给定的初始状态分布，对于每种策略，都有（随机）马尔可夫策略，可以自然定义，以便在每个时间实例上，这两个策略对这两个政治的边际分布都可以使这两个策略对这些策略融为一体。在本文中显示，如果相应的马尔可夫策略定义了非Xplosive跳跃马尔可夫进程，则对于CTJMDP进行了这种平等。如果这个马尔可夫过程是爆炸性的，那么在每个时间实例中，国家行动对属于可衡量的州行动对的边际概率对于所描述的马尔可夫策略而言并不比原始策略的相同概率更大。这些结果在本文中用于证明，对于预期的折扣总成本和平均每单位成本，对于给定的初始状态分配，对于CTJMDP的每个策略，Markov策略的绩效相同或更好。

This paper extends to Continuous-Time Jump Markov Decision Processes (CTJMDP) the classic result for Markov Decision Processes stating that, for a given initial state distribution, for every policy there is a (randomized) Markov policy, which can be defined in a natural way, such that at each time instance the marginal distributions of state-action pairs for these two policies coincide. It is shown in this paper that this equality takes place for a CTJMDP if the corresponding Markov policy defines a nonexplosive jump Markov process. If this Markov process is explosive, then at each time instance the marginal probability, that a state-action pair belongs to a measurable set of state-action pairs, is not greater for the described Markov policy than the same probability for the original policy. These results are used in this paper to prove that for expected discounted total costs and for average costs per unit time, for a given initial state distribution, for each policy for a CTJMDP the described a Markov policy has the same or better performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题