Borkar控制拓扑成本的连续性以及在几个标准下对受控扩散的离散空间和时间近似的影响

论文标题

Borkar控制拓扑成本的连续性以及在几个标准下对受控扩散的离散空间和时间近似的影响

Continuity of Cost in Borkar Control Topology and Implications on Discrete Space and Time Approximations for Controlled Diffusions under Several Criteria

论文作者

Pradhan, Somnath, Yüksel, Serdar

论文摘要

我们首先表明，折现成本，最高退出时间的成本以及涉及受控非分类扩散的奇异成本在固定控制政策的空间上是连续的。 S. Borkar，一种用于马尔可夫控制的拓扑，应用数学和优化20（1989），55-62]。当控制策略是马尔可夫并将拓扑修订以将时间还包括作为参数时，也适用于有限的地平线问题。然后，我们确定有限的动作/分段恒定的固定策略在固定的马尔可夫政策的空间中密集。使用上述连续性和密度结果，我们确定有限的动作/分段恒定策略近似于任意精度的最佳固定策略。这引起了许多数值方法的适用性，例如政策迭代和随机学习方法的折扣成本，最高成本，直至退出时间以及连续时间的ergodic成本最佳控制问题。对于有限摩托的设置，我们还通过类似的论点确定了时间限制政策的最佳性。因此，我们提出了一种直接适用于几个常用成本标准的近似值的统一和简洁的方法。

We first show that the discounted cost, cost up to an exit time, and ergodic cost involving controlled non-degenerate diffusions are continuous on the space of stationary control policies when the policies are given a topology introduced by Borkar [V. S. Borkar, A topology for Markov controls, Applied Mathematics and Optimization 20 (1989), 55-62]. The same applies for finite horizon problems when the control policies are markov and the topology is revised to include time also as a parameter. We then establish that finite action/piecewise constant stationary policies are dense in the space of stationary Markov policies under this topology. Using the above mentioned continuity and denseness results we establish that finite action/piecewise constant policies approximate optimal stationary policies with arbitrary precision. This gives rise to the applicability of many numerical methods such as policy iteration and stochastic learning methods for discounted cost, cost up to an exit time, and ergodic cost optimal control problems in continuous-time. For the finite-horizon setup, we establish additionally near optimality of time-discretized policies by an analogous argument. We thus present a unified and concise approach for approximations directly applicable under several commonly adopted cost criteria.

下载PDF全文

下载文献需遵守相关版权规定

论文标题