关于控制马尔可夫决策过程的基于学习历史的政策

论文标题

关于控制马尔可夫决策过程的基于学习历史的政策

On learning history based policies for controlling Markov decision processes

论文作者

Patil, Gandharv, Mahajan, Aditya, Precup, Doina

论文摘要

增强型（RL）FolkloresuggestSthathistory基于基于approximationMethods，SueAS经常发生的神经网或基于历史的状态抽象，其表现胜于无记忆的同行，因为Markov决策过程（MDP）的功能近似可以视为引起可诱发的零件可观的观察。但是，由于大多数现有框架专门关注无记忆功能，因此对此类基于历史的算法的正式分析几乎没有正式分析。在本文中，我们引入了一个理论框架，用于研究使用基于历史的特征抽象映射来控制MDP的RL算法的行为。此外，我们使用此框架来设计一种实用的RL算法，并在数值上评估其在一组连续控制任务上的有效性。

Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP. However, there has been little formal analysis of such history-based algorithms, as most existing frameworks focus exclusively on memory-less features. In this paper, we introduce a theoretical framework for studying the behaviour of RL algorithms that learn to control an MDP using history-based feature abstraction mappings. Furthermore, we use this framework to design a practical RL algorithm and we numerically evaluate its effectiveness on a set of continuous control tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题