论文标题
布尔的布尔任务代数用于加固学习
A Boolean Task Algebra for Reinforcement Learning
论文作者
论文摘要
构成学习技能解决新任务的能力是终身学习代理商的重要特性。在这项工作中,我们将任务的逻辑组成形式化为布尔代数。这使我们能够根据一组基本任务的否定,脱节和连词来制定新任务。然后,我们表明,通过学习面向目标的价值功能并限制任务的过渡动态,代理可以在没有进一步学习的情况下解决这些新任务。我们证明,通过以特定方式组成这些值函数,我们立即为布尔代数下表达的所有任务恢复最佳策略。我们在两个域中验证我们的方法 - 包括需要功能近似的高维视频游戏环境---代理商首先学习一组基本技能,然后撰写它们以求解超过指数的新任务。
The ability to compose learned skills to solve new tasks is an important property of lifelong-learning agents. In this work, we formalise the logical composition of tasks as a Boolean algebra. This allows us to formulate new tasks in terms of the negation, disjunction and conjunction of a set of base tasks. We then show that by learning goal-oriented value functions and restricting the transition dynamics of the tasks, an agent can solve these new tasks with no further learning. We prove that by composing these value functions in specific ways, we immediately recover the optimal policies for all tasks expressible under the Boolean algebra. We verify our approach in two domains---including a high-dimensional video game environment requiring function approximation---where an agent first learns a set of base skills, and then composes them to solve a super-exponential number of new tasks.