论文标题
在非马科夫决策过程中用于PAC增强学习的马尔可夫抽象
Markov Abstractions for PAC Reinforcement Learning in Non-Markov Decision Processes
论文作者
论文摘要
我们的工作旨在开发不依赖马尔可夫假设的强化学习算法。我们考虑了非马尔科夫决策过程的类别,在保留动力学的同时,可以将历史抽象成有限的状态集。我们称其为马尔可夫抽象,因为它在编码非马科夫动力学的一组状态上引起了马尔可夫决策过程。这种现象是最近引入的常规决策过程(以及只有有限数量的信念状态的POMDP)。在所有类型的决策过程中,使用马尔可夫抽象的代理可以依靠马尔可夫属性来实现最佳行为。我们表明,马尔可夫抽象可以在加强学习过程中学习。我们的方法结合了自动机学习和经典的增强学习。对于这两个任务,可以采用标准算法。我们表明,当受雇算法具有PAC保证时,我们的方法可以保证PAC,并且我们还提供了实验评估。
Our work aims at developing reinforcement learning algorithms that do not rely on the Markov assumption. We consider the class of Non-Markov Decision Processes where histories can be abstracted into a finite set of states while preserving the dynamics. We call it a Markov abstraction since it induces a Markov Decision Process over a set of states that encode the non-Markov dynamics. This phenomenon underlies the recently introduced Regular Decision Processes (as well as POMDPs where only a finite number of belief states is reachable). In all such kinds of decision process, an agent that uses a Markov abstraction can rely on the Markov property to achieve optimal behaviour. We show that Markov abstractions can be learned during reinforcement learning. Our approach combines automata learning and classic reinforcement learning. For these two tasks, standard algorithms can be employed. We show that our approach has PAC guarantees when the employed algorithms have PAC guarantees, and we also provide an experimental evaluation.