在不确定的马尔可夫决策过程中，Minimax遗憾优化了强大的计划

论文标题

在不确定的马尔可夫决策过程中，Minimax遗憾优化了强大的计划

Minimax Regret Optimisation for Robust Planning in Uncertain Markov Decision Processes

论文作者

Rigter, Marc, Lacerda, Bruno, Hawes, Nick

论文摘要

Markov决策过程（MDP）的参数通常不能准确指定。不确定的MDP（UMDP）通过定义参数所属的集合来捕获此模型歧义。 Minimax遗憾被提议是计划在UMDPS计划的目标，以寻找不过分保守的强大政策。在这项工作中，我们专注于计划具有不确定成本和过渡功能的随机最短路径（SSP）UMDP。我们介绍了一个钟声方程，以计算政策的遗憾。我们提出了一种动态的编程算法，该算法利用了遗憾的贝尔曼方程，并表明它优化了Minimax的遗憾，完全不确定性地为UMDP。对于耦合的不确定性，我们扩展了使用选项的方法，以在计算和解决方案质量之间进行权衡。我们对合成和现实世界域的方法评估了我们的方法，这表明它的表现明显优于现有基准。

The parameters for a Markov Decision Process (MDP) often cannot be specified exactly. Uncertain MDPs (UMDPs) capture this model ambiguity by defining sets which the parameters belong to. Minimax regret has been proposed as an objective for planning in UMDPs to find robust policies which are not overly conservative. In this work, we focus on planning for Stochastic Shortest Path (SSP) UMDPs with uncertain cost and transition functions. We introduce a Bellman equation to compute the regret for a policy. We propose a dynamic programming algorithm that utilises the regret Bellman equation, and show that it optimises minimax regret exactly for UMDPs with independent uncertainties. For coupled uncertainties, we extend our approach to use options to enable a trade off between computation and solution quality. We evaluate our approach on both synthetic and real-world domains, showing that it significantly outperforms existing baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题