探索，发现和学习：无监督的国家覆盖技能

论文标题

探索，发现和学习：无监督的国家覆盖技能

Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills

论文作者

Campos, Víctor, Trott, Alexander, Xiong, Caiming, Socher, Richard, Giro-i-Nieto, Xavier, Torres, Jordi

论文摘要

在没有任务导向的奖励功能的情况下，获得能力是强化学习研究的前沿。通过授权镜头研究了这个问题，该镜头在期权发现与信息理论之间建立了联系。信息理论技能发现方法引起了社区的极大兴趣，但是在理解其局限性方面几乎没有进行研究。通过理论分析和经验证据，我们表明现有算法遭受了共同的限制 - 他们发现了对国家空间覆盖不良的选择。鉴于此，我们提出了“探索，发现和学习”（EDL），这是信息理论技能发现的另一种方法。至关重要的是，EDL优化了从授权文献中得出的相同信息理论目标，但使用不同的机械解决了优化问题。我们对受控环境的技能发现方法进行了广泛的评估，并表明EDL提供了重要的优势，例如克服覆盖范围问题，降低学习技能对初始状态的依赖，并允许用户定义以前的行为。代码可在https://github.com/victorcampos7/edl上公开获取。

Acquiring abilities in the absence of a task-oriented reward function is at the frontier of reinforcement learning research. This problem has been studied through the lens of empowerment, which draws a connection between option discovery and information theory. Information-theoretic skill discovery methods have garnered much interest from the community, but little research has been conducted in understanding their limitations. Through theoretical analysis and empirical evidence, we show that existing algorithms suffer from a common limitation -- they discover options that provide a poor coverage of the state space. In light of this, we propose 'Explore, Discover and Learn' (EDL), an alternative approach to information-theoretic skill discovery. Crucially, EDL optimizes the same information-theoretic objective derived from the empowerment literature, but addresses the optimization problem using different machinery. We perform an extensive evaluation of skill discovery methods on controlled environments and show that EDL offers significant advantages, such as overcoming the coverage problem, reducing the dependence of learned skills on the initial state, and allowing the user to define a prior over which behaviors should be learned. Code is publicly available at https://github.com/victorcampos7/edl.

下载PDF全文

下载文献需遵守相关版权规定

论文标题