梯度结合的动态编程，用于具有概率性能的子管道和凹面可扩展的价值功能

论文标题

梯度结合的动态编程，用于具有概率性能的子管道和凹面可扩展的价值功能

Gradient-Bounded Dynamic Programming for Submodular and Concave Extensible Value Functions with Probabilistic Performance Guarantees

论文作者

Lebedev, Denis, Goulart, Paul, Margellos, Kostas

论文摘要

我们考虑使用高维，离散状态空间和有限的离散时间范围的随机动态编程问题，这些问题由于“维度的诅咒”而导致所有状态和时间步骤都禁止从给定的钟形方程中直接计算值的值函数。对于动态程序的价值函数在其状态空间中是可扩展的且supsodular的情况，我们提出了一种新算法，该算法在双动态编程领域中计算确定性的上层和随机的下限。我们表明，所提出的算法在有限数量的迭代后终止。此外，我们在相关策略下累积的价值来得出概率保证，以实现动态程序的单个实现以及对该价值的期望。最后，我们证明了方法在出庭送货上的交付插槽定价的高维数字示例中的功效。

We consider stochastic dynamic programming problems with high-dimensional, discrete state-spaces and finite, discrete-time horizons that prohibit direct computation of the value function from a given Bellman equation for all states and time steps due to the "curse of dimensionality". For the case where the value function of the dynamic program is concave extensible and submodular in its state-space, we present a new algorithm that computes deterministic upper and stochastic lower bounds of the value function in the realm of dual dynamic programming. We show that the proposed algorithm terminates after a finite number of iterations. Furthermore, we derive probabilistic guarantees on the value accumulated under the associated policy for a single realisation of the dynamic program and for the expectation of this value. Finally, we demonstrate the efficacy of our approach on a high-dimensional numerical example from delivery slot pricing in attended home delivery.

下载PDF全文

下载文献需遵守相关版权规定

论文标题