论文标题

强大的约束强化学习

Robust Constrained Reinforcement Learning

论文作者

Wang, Yue, Miao, Fei, Zou, Shaofeng

论文摘要

受限的强化学习是最大程度地提高预期奖励受到公用事业/成本的限制。但是,由于建模错误,对抗性攻击,非平稳性,训练环境可能与测试环境不一样,从而导致严重的性能下降,更重要的是违反了约束。我们提出了一个在模型不确定性下的强大约束强化学习框架,其中MDP不是固定的,而是在某些不确定性集中,目的是确保在不确定性集中满足所有MDP的限制,并在不确定性集合中满足所有MDP,并最大程度地提高不确定性的奖励性能。我们设计了一种强大的原始双重方法,并在理论上进一步发展了其收敛性,复杂性和可行性的保证。然后,我们研究了$δ$ - 污染不确定性集的具体示例,设计一种在线且无模型的算法,并理论上表征了其样品复杂性。

Constrained reinforcement learning is to maximize the expected reward subject to constraints on utilities/costs. However, the training environment may not be the same as the test one, due to, e.g., modeling error, adversarial attack, non-stationarity, resulting in severe performance degradation and more importantly constraint violation. We propose a framework of robust constrained reinforcement learning under model uncertainty, where the MDP is not fixed but lies in some uncertainty set, the goal is to guarantee that constraints on utilities/costs are satisfied for all MDPs in the uncertainty set, and to maximize the worst-case reward performance over the uncertainty set. We design a robust primal-dual approach, and further theoretically develop guarantee on its convergence, complexity and robust feasibility. We then investigate a concrete example of $δ$-contamination uncertainty set, design an online and model-free algorithm and theoretically characterize its sample complexity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源