强大的约束强化学习

论文标题

强大的约束强化学习

Robust Constrained Reinforcement Learning

论文作者

Wang, Yue, Miao, Fei, Zou, Shaofeng

论文摘要

受限的强化学习是最大程度地提高预期奖励受到公用事业/成本的限制。但是，由于建模错误，对抗性攻击，非平稳性，训练环境可能与测试环境不一样，从而导致严重的性能下降，更重要的是违反了约束。我们提出了一个在模型不确定性下的强大约束强化学习框架，其中MDP不是固定的，而是在某些不确定性集中，目的是确保在不确定性集中满足所有MDP的限制，并在不确定性集合中满足所有MDP，并最大程度地提高不确定性的奖励性能。我们设计了一种强大的原始双重方法，并在理论上进一步发展了其收敛性，复杂性和可行性的保证。然后，我们研究了$δ$ - 污染不确定性集的具体示例，设计一种在线且无模型的算法，并理论上表征了其样品复杂性。

Constrained reinforcement learning is to maximize the expected reward subject to constraints on utilities/costs. However, the training environment may not be the same as the test one, due to, e.g., modeling error, adversarial attack, non-stationarity, resulting in severe performance degradation and more importantly constraint violation. We propose a framework of robust constrained reinforcement learning under model uncertainty, where the MDP is not fixed but lies in some uncertainty set, the goal is to guarantee that constraints on utilities/costs are satisfied for all MDPs in the uncertainty set, and to maximize the worst-case reward performance over the uncertainty set. We design a robust primal-dual approach, and further theoretically develop guarantee on its convergence, complexity and robust feasibility. We then investigate a concrete example of $δ$-contamination uncertainty set, design an online and model-free algorithm and theoretically characterize its sample complexity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题