论文标题
关于控制马尔可夫决策过程的基于学习历史的政策
On learning history based policies for controlling Markov decision processes
论文作者
论文摘要
增强型(RL)FolkloresuggestSthathistory基于基于approximationMethods,SueAS经常发生的神经网或基于历史的状态抽象,其表现胜于无记忆的同行,因为Markov决策过程(MDP)的功能近似可以视为引起可诱发的零件可观的观察。但是,由于大多数现有框架专门关注无记忆功能,因此对此类基于历史的算法的正式分析几乎没有正式分析。在本文中,我们引入了一个理论框架,用于研究使用基于历史的特征抽象映射来控制MDP的RL算法的行为。此外,我们使用此框架来设计一种实用的RL算法,并在数值上评估其在一组连续控制任务上的有效性。
Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP. However, there has been little formal analysis of such history-based algorithms, as most existing frameworks focus exclusively on memory-less features. In this paper, we introduce a theoretical framework for studying the behaviour of RL algorithms that learn to control an MDP using history-based feature abstraction mappings. Furthermore, we use this framework to design a practical RL algorithm and we numerically evaluate its effectiveness on a set of continuous control tasks.