在随机系统中的机会主义定性规划，具有不完整的偏好而不是可及性目标

论文标题

在随机系统中的机会主义定性规划，具有不完整的偏好而不是可及性目标

Opportunistic Qualitative Planning in Stochastic Systems with Incomplete Preferences over Reachability Objectives

论文作者

Kulkarni, Abhishek N., Fu, Jie

论文摘要

偏好在确定哪些目标/约束可以同时满足时要满足哪些目标/约束起着关键作用。在本文中，我们研究了如何合成以MDP为模型的随机系统中的偏好满足计划，给定（可能是不完整的）组合性偏好模型而不是时间扩展的目标。首先，我们引入新的语义来解释与随机系统无限游戏的偏好。然后，我们介绍了一种新的改进概念，以便在无限播放的两个前缀之间进行比较。基于此，我们定义了两个解决方案概念，称为安全和正面改进（SPI）和安全且几乎既有改进（SASI），它们分别以正概率和概率为单位来实施改进。我们构建了一个称为改进MDP的模型，在该模型中，SPI和SASI策略的综合保证至少一种改进可以减少计算MDP中积极且几乎是纯粹的获胜策略。我们提出了一种算法，以合成引起多个顺序改进的SPI和SASI策略。我们使用机器人运动计划问题证明了建议的方法。

Preferences play a key role in determining what goals/constraints to satisfy when not all constraints can be satisfied simultaneously. In this paper, we study how to synthesize preference satisfying plans in stochastic systems, modeled as an MDP, given a (possibly incomplete) combinative preference model over temporally extended goals. We start by introducing new semantics to interpret preferences over infinite plays of the stochastic system. Then, we introduce a new notion of improvement to enable comparison between two prefixes of an infinite play. Based on this, we define two solution concepts called safe and positively improving (SPI) and safe and almost-surely improving (SASI) that enforce improvements with a positive probability and with probability one, respectively. We construct a model called an improvement MDP, in which the synthesis of SPI and SASI strategies that guarantee at least one improvement reduces to computing positive and almost-sure winning strategies in an MDP. We present an algorithm to synthesize the SPI and SASI strategies that induce multiple sequential improvements. We demonstrate the proposed approach using a robot motion planning problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题