桑巴：基于安全模型和主动的强化学习

论文标题

桑巴：基于安全模型和主动的强化学习

SAMBA: Safe Model-Based & Active Reinforcement Learning

论文作者

Cowen-Rivers, Alexander I., Palenicek, Daniel, Moens, Vincent, Abdullah, Mohammed, Sootla, Aivar, Wang, Jun, Ammar, Haitham

论文摘要

在本文中，我们提出了Samba，这是一个安全增强学习的新型框架，结合了概率建模，信息理论和统计数据的各个方面。我们的方法建立在PILCO的基础上，可以使用新颖的（半）指标进行主动探索，以通过多目标问题优化样本外的高斯流程评估，该过程支持有条件的危险危险约束。我们对涉及低维状态表示的各种安全动力学系统基准进行了评估算法。我们的结果表明，与最先进的方法相比，样本和违规的数量级降低。最后，我们通过对我们的主动指标和安全限制的详细分析来提供有关框架的有效性的直觉。

In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. Our results show orders of magnitude reductions in samples and violations compared to state-of-the-art methods. Lastly, we provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.

下载PDF全文

下载文献需遵守相关版权规定

论文标题