线性收敛的约束马尔可夫决策过程的算法

论文标题

线性收敛的约束马尔可夫决策过程的算法

Algorithm for Constrained Markov Decision Process with Linear Convergence

论文作者

Gladin, Egor, Lavrik-Karmazin, Maksim, Zainullina, Karina, Rudenko, Varvara, Gasnikov, Alexander, Takáč, Martin

论文摘要

考虑了约束马尔可夫决策过程的问题。代理商旨在最大程度地提高预期的累积折扣奖励，但要受到其成本的多个限制（约束数量相对较少）。提出了一种新的双重方法，并集成了两种成分：熵正规策略优化器和Vaidya的双优化器，这对于实现更快的融合至关重要。提供了建议方法的有限时间误差绑定。尽管非循环客观的挑战受到非循环约束的挑战，但所提出的方法表明将（线性速率）收敛到整体最佳。根据现有的原始偶对偶的方法，根据最佳差距和约束违规表示的复杂性显着改善。

The problem of constrained Markov decision process is considered. An agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its costs (the number of constraints is relatively small). A new dual approach is proposed with the integration of two ingredients: entropy regularized policy optimizer and Vaidya's dual optimizer, both of which are critical to achieve faster convergence. The finite-time error bound of the proposed approach is provided. Despite the challenge of the nonconcave objective subject to nonconcave constraints, the proposed approach is shown to converge (with linear rate) to the global optimum. The complexity expressed in terms of the optimality gap and the constraint violation significantly improves upon the existing primal-dual approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题