技能机器：增强学习的时间逻辑技能组成

论文标题

技能机器：增强学习的时间逻辑技能组成

Skill Machines: Temporal Logic Skill Composition in Reinforcement Learning

论文作者

Tasse, Geraud Nangue, Jarvis, Devon, James, Steven, Rosman, Benjamin

论文摘要

对于代理商，可以解决可以通过同一环境中的语言指定的各种各样的问题。获得此类代理商的一种流行方法是重用在先前任务中学习的技能，以将其概括为新的技能。但是，这是一个具有挑战性的问题，这是由于组合在逻辑上和时间上都可以在语言上结合高级目标引起的维度的诅咒。为了解决这个问题，我们提出了一个框架，代理商首先学会了一套足够的技能基础，以实现其环境中的所有高级目标。然后，代理可以在逻辑和时间上灵活地构成它们，以便以任何常规语言（例如线性时间逻辑的常规片段）实现时间逻辑规格。这为代理提供了从复杂的时间逻辑任务规范映射到近距离行为零射击的能力。我们在表格环境以及高维视频游戏和连续的控制环境中进行实验证明了这一点。最后，我们还证明，当需要最佳行为时，可以通过常规的非政策增强学习算法来提高技能机器的性能。

It is desirable for an agent to be able to solve a rich variety of problems that can be specified through language in the same environment. A popular approach towards obtaining such agents is to reuse skills learned in prior tasks to generalise compositionally to new ones. However, this is a challenging problem due to the curse of dimensionality induced by the combinatorially large number of ways high-level goals can be combined both logically and temporally in language. To address this problem, we propose a framework where an agent first learns a sufficient set of skill primitives to achieve all high-level goals in its environment. The agent can then flexibly compose them both logically and temporally to provably achieve temporal logic specifications in any regular language, such as regular fragments of linear temporal logic. This provides the agent with the ability to map from complex temporal logic task specifications to near-optimal behaviours zero-shot. We demonstrate this experimentally in a tabular setting, as well as in a high-dimensional video game and continuous control environment. Finally, we also demonstrate that the performance of skill machines can be improved with regular off-policy reinforcement learning algorithms when optimal behaviours are desired.

下载PDF全文

下载文献需遵守相关版权规定

论文标题