论文标题

价值迭代算法的强度多项式,用于计算折扣动态编程的几乎最佳策略

Strong Polynomiality of the Value Iteration Algorithm for Computing Nearly Optimal Policies for Discounted Dynamic Programming

论文作者

Feinberg, Eugene A., He, Gaojin

论文摘要

本说明提供了对按价值迭代计算所需的操作数量的上限,这是无限马可分子折扣马尔可夫决策过程的几乎最佳政策,并具有有限的状态和行动。对于给定的折现因子,奖励功能的大小以及与最佳性的所需关系,这些上限在州行动对的数量中是强烈的多项式,并且所提供的上限之一具有其属性,即它是折现因子价值的不良函数。

This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number of states and actions. For a given discount factor, magnitude of the reward function, and desired closeness to optimality, these upper bounds are strongly polynomial in the number of state-action pairs, and one of the provided upper bounds has the property that it is a non-decreasing function of the value of the discount factor.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源