无奖励政策空间压缩用于加固学习

论文标题

无奖励政策空间压缩用于加固学习

Reward-Free Policy Space Compression for Reinforcement Learning

论文作者

Mutti, Mirco, Del Col, Stefano, Restelli, Marcello

论文摘要

在强化学习中，我们将与环境相互作用的代理的潜在行为编码为无限策略集，即政策空间，通常由参数功能家族表示。处理这样的政策空间是一个艰巨的挑战，通常会导致样本和计算效率低下。但是，我们认为，当我们还考虑了环境的结构和策略参数化的结构时，实际上有限的策略实际上是相关的，因为其中许多策略将诱导非常相似的交互，即状态行动分布。在本文中，我们寻求将政策空间无奖励压缩到有限的一组代表性政策中，这样，鉴于任何政策$π$，代表性政策的州行动分布与$π$的状态行动分布之间的最低rényi差异是有限的。我们表明，策略空间的这种压缩可以作为设定盖问题进行配合，并且本质上是NP-HARD。尽管如此，我们提出了一种游戏理论重新制定，可以通过迭代地拉伸压缩空间来涵盖对抗性政策，从而有效地找到本地最佳解决方案。最后，我们提供了经验评估，以说明简单领域中的压缩程序及其在增强学习中的连锁反应。

In reinforcement learning, we encode the potential behaviors of an agent interacting with an environment into an infinite set of policies, the policy space, typically represented by a family of parametric functions. Dealing with such a policy space is a hefty challenge, which often causes sample and computation inefficiencies. However, we argue that a limited number of policies are actually relevant when we also account for the structure of the environment and of the policy parameterization, as many of them would induce very similar interactions, i.e., state-action distributions. In this paper, we seek for a reward-free compression of the policy space into a finite set of representative policies, such that, given any policy $π$, the minimum Rényi divergence between the state-action distributions of the representative policies and the state-action distribution of $π$ is bounded. We show that this compression of the policy space can be formulated as a set cover problem, and it is inherently NP-hard. Nonetheless, we propose a game-theoretic reformulation for which a locally optimal solution can be efficiently found by iteratively stretching the compressed space to cover an adversarial policy. Finally, we provide an empirical evaluation to illustrate the compression procedure in simple domains, and its ripple effects in reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题