利用近似符号模型通过技能多样性来加强学习

论文标题

利用近似符号模型通过技能多样性来加强学习

Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity

论文作者

Guan, Lin, Sreedharan, Sarath, Kambhampati, Subbarao

论文摘要

长期以来，能够接受和利用特定于人类的任务知识的增强学习（RL）代理被确定为开发可扩展方法来解决长途问题的可能策略。尽管以前的工作已经研究了使用符号模型以及RL方法的可能性，但他们倾向于假设高级动作模型在低级别上是可执行的，并且流利者可以专门表征所有理想的MDP状态。但是，现实世界任务的符号模型通常是不完整的。为此，我们介绍了近似符号模型引导的增强学习，其中我们将形式化符号模型与基础MDP之间的关系，这将使我们能够表征符号模型的不完整。我们将使用这些模型来提取将用于分解任务的高级地标。在低水平上，我们为每个可能的任务次目标学习了一组不同的政策，然后将其缝合在一起。我们通过在三个不同的基准域进行测试来评估我们的系统，并显示即使使用不完整的符号模型信息，我们的方法也能够发现任务结构并有效地指导RL代理到达目标。

Creating reinforcement learning (RL) agents that are capable of accepting and leveraging task-specific knowledge from humans has been long identified as a possible strategy for developing scalable approaches for solving long-horizon problems. While previous works have looked at the possibility of using symbolic models along with RL approaches, they tend to assume that the high-level action models are executable at low level and the fluents can exclusively characterize all desirable MDP states. Symbolic models of real world tasks are however often incomplete. To this end, we introduce Approximate Symbolic-Model Guided Reinforcement Learning, wherein we will formalize the relationship between the symbolic model and the underlying MDP that will allow us to characterize the incompleteness of the symbolic model. We will use these models to extract high-level landmarks that will be used to decompose the task. At the low level, we learn a set of diverse policies for each possible task subgoal identified by the landmark, which are then stitched together. We evaluate our system by testing on three different benchmark domains and show how even with incomplete symbolic model information, our approach is able to discover the task structure and efficiently guide the RL agent towards the goal.

下载PDF全文

下载文献需遵守相关版权规定

论文标题