安全保证的加强学习基于多级支持向量机器

论文标题

安全保证的加强学习基于多级支持向量机器

Safety-guaranteed Reinforcement Learning based on Multi-class Support Vector Machine

论文作者

Kim, Kwangyeon, Gupta, Akshita, Choi, Hong-Cheol, Hwang, Inseok

论文摘要

几项工作已经解决了将约束纳入加固学习（RL）框架的问题，但是大多数只能保证对软限制的满意度。在这项工作中，我们解决了通过确定性系统动力学的无模型RL设置满足硬状态约束的问题。提出的算法是为离散状态和行动空间开发的，并利用多级支持向量机（SVM）来表示策略。状态约束纳入了SVM优化框架中，以得出用于确定策略参数的分析解决方案。最终的政策将融合到可以满足约束的解决方案中。另外，提出的配方粘附在Q学习框架上，因此也保证了融合到最佳解决方案。该算法有多个示例问题。

Several works have addressed the problem of incorporating constraints in the reinforcement learning (RL) framework, however majority of them can only guarantee the satisfaction of soft constraints. In this work, we address the problem of satisfying hard state constraints in a model-free RL setting with the deterministic system dynamics. The proposed algorithm is developed for the discrete state and action space and utilizes a multi-class support vector machine (SVM) to represent the policy. The state constraints are incorporated in the SVM optimization framework to derive an analytical solution for determining the policy parameters. This final policy converges to a solution which is guaranteed to satisfy the constraints. Additionally, the proposed formulation adheres to the Q-learning framework and thus, also guarantees convergence to the optimal solution. The algorithm is demonstrated with multiple example problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题