受限的马尔可夫决策过程受到限制

论文标题

受限的马尔可夫决策过程受到限制

Constrained Risk-Averse Markov Decision Processes

论文作者

Ahmadi, Mohamadreza, Rosolia, Ugo, Ingham, Michel D., Murray, Richard M., Ames, Aaron D.

论文摘要

我们考虑了针对马尔可夫决策过程（MDP）设计政策的问题，该政策具有动态连贯的风险目标和约束。我们首先在拉格朗日框架中提出问题。在假设风险目标和约束可以通过马尔可夫风险转变映射表示的假设，我们提出了一种基于优化的方法来综合马尔可夫政策，以降低约束受约束的避免风险的问题。我们证明了配制的优化问题是差异凸面程序（DCP）的形式，可以通过纪律处分的凸孔concave编程（DCCP）框架来解决。我们表明，这些结果概括了受约束的MDP的线性程序，并具有总折扣预期成本和约束。最后，我们通过数值实验来说明拟议方法的有效性，这些实验在涉及有条件的 - 危险风险（CVAR）和Entropic-value-at-at-strisk（EVAR）相干风险度量方面进行了导航问题。

We consider the problem of designing policies for Markov decision processes (MDPs) with dynamic coherent risk objectives and constraints. We begin by formulating the problem in a Lagrangian framework. Under the assumption that the risk objectives and constraints can be represented by a Markov risk transition mapping, we propose an optimization-based method to synthesize Markovian policies that lower-bound the constrained risk-averse problem. We demonstrate that the formulated optimization problems are in the form of difference convex programs (DCPs) and can be solved by the disciplined convex-concave programming (DCCP) framework. We show that these results generalize linear programs for constrained MDPs with total discounted expected costs and constraints. Finally, we illustrate the effectiveness of the proposed method with numerical experiments on a rover navigation problem involving conditional-value-at-risk (CVaR) and entropic-value-at-risk (EVaR) coherent risk measures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题