通过参数成本函数近似进行多阶段随机编程的加强学习

论文标题

通过参数成本函数近似进行多阶段随机编程的加强学习

Reinforcement Learning via Parametric Cost Function Approximation for Multistage Stochastic Programming

论文作者

Ghadimi, Saeed, Perkins, Raymond T., Powell, Warren B.

论文摘要

解决研究文献中随机资源分配问题的最常见方法是使用价值功能（“动态编程”）或场景树（“随机编程”），以近似现在决定对未来的影响。相比之下，共同的行业实践是使用对未来的确定性近似，这更容易理解和解决，但由于忽略不确定性而受到批评。我们表明，确定性lookahead的参数化版本可能是处理不确定性的有效方法，同时享受确定性lookahead的计算简单性。我们将参数化的LookAhead模型作为解决随机基本模型的策略形式，该模型被用作优化参数化策略的基础。这种方法可以处理复杂的高维状态变量，并避免与方案树相关的通常近似值。我们对这种方法进行了形式，并在复杂的非平稳储能问题的背景下证明了它的使用。

The most common approaches for solving stochastic resource allocation problems in the research literature is to either use value functions ("dynamic programming") or scenario trees ("stochastic programming") to approximate the impact of a decision now on the future. By contrast, common industry practice is to use a deterministic approximation of the future which is easier to understand and solve, but which is criticized for ignoring uncertainty. We show that a parameterized version of a deterministic lookahead can be an effective way of handling uncertainty, while enjoying the computational simplicity of a deterministic lookahead. We present the parameterized lookahead model as a form of policy for solving a stochastic base model, which is used as the basis for optimizing the parameterized policy. This approach can handle complex, high-dimensional state variables, and avoids the usual approximations associated with scenario trees. We formalize this approach and demonstrate its use in the context of a complex, nonstationary energy storage problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题