寻找安全的政策区域马尔卡夫决策过程

论文标题

寻找安全的政策区域马尔卡夫决策过程

Finding Safe Zones of policies Markov Decision Processes

论文作者

Cohen, Lee, Mansour, Yishay, Moshkovitz, Michal

论文摘要

鉴于马尔可夫决策过程的策略，我们将安全区域定义为一部分状态，使得该政策的大多数轨迹都局限于该子集。安全区的质量通过状态数量和逃逸概率进行参数，即随机轨迹离开子集的概率。当Safezones拥有少量状态和低逃逸概率时，SafeZones特别有趣。我们研究了寻找最佳安全区的复杂性，并表明总体上，问题在计算上很难。我们的主要结果是使用多项式尺寸样本复杂性的双重标准近似学习算法，对于逃生概率和安全区域的大小，其近似值几乎为$ 2 $。

Given a policy of a Markov Decision Process, we define a SafeZone as a subset of states, such that most of the policy's trajectories are confined to this subset. The quality of a SafeZone is parameterized by the number of states and the escape probability, i.e., the probability that a random trajectory will leave the subset. SafeZones are especially interesting when they have a small number of states and low escape probability. We study the complexity of finding optimal SafeZones, and show that in general, the problem is computationally hard. Our main result is a bi-criteria approximation learning algorithm with a factor of almost $2$ approximation for both the escape probability and SafeZone size, using a polynomial size sample complexity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题