TOMA：用于加强学习的拓扑图抽象

论文标题

TOMA：用于加强学习的拓扑图抽象

TOMA: Topological Map Abstraction for Reinforcement Learning

论文作者

Yin, Zhao-Heng, Li, Wu-Jun

论文摘要

动物能够发现周围环境的拓扑图（图），该图将用于导航。受这种生物学现象的启发，研究人员最近提出了为马尔可夫决策过程（MDP）生成图表表示，并使用此类图表进行增强学习（RL）。但是，现有的图生成方法遭受了许多缺点。一个缺点是现有方法不学习图形的抽象，这导致了高内存和计算成本。该缺点还使生成的图形不舒适，从而降低了计划性能。另一个缺点是，现有方法不能用于促进探索，这在RL中很重要。在本文中，我们提出了一种称为图形生成的新方法，称为拓扑图抽象（TOMA）。 Toma可以为MDP生成抽象的图表表示，其成本比现有方法要少得多。此外，Toma可用于促进探索。特别是，我们建议计划探索，其中TOMA用于通过指导代理商朝着未开发的状态来加速探索。还提出了一个新型的体验重播模块，称为顶点记忆，以提高勘探性能。实验结果表明，Toma可以胜过现有的方法来实现最先进的性能。

Animals are able to discover the topological map (graph) of surrounding environment, which will be used for navigation. Inspired by this biological phenomenon, researchers have recently proposed to generate graph representation for Markov decision process (MDP) and use such graphs for planning in reinforcement learning (RL). However, existing graph generation methods suffer from many drawbacks. One drawback is that existing methods do not learn an abstraction for graphs, which results in high memory and computation cost. This drawback also makes generated graph non-robust, which degrades the planning performance. Another drawback is that existing methods cannot be used for facilitating exploration which is important in RL. In this paper, we propose a new method, called topological map abstraction (TOMA), for graph generation. TOMA can generate an abstract graph representation for MDP, which costs much less memory and computation cost than existing methods. Furthermore, TOMA can be used for facilitating exploration. In particular, we propose planning to explore, in which TOMA is used to accelerate exploration by guiding the agent towards unexplored states. A novel experience replay module called vertex memory is also proposed to improve exploration performance. Experimental results show that TOMA can outperform existing methods to achieve the state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题