论文标题
对连续行动空间的表示学习对有效的政策学习有益
Representation Learning for Continuous Action Spaces is Beneficial for Efficient Policy Learning
论文作者
论文摘要
深度强化学习(DRL)借助深度学习的感知能力突破了传统加固学习(RL)的瓶颈,并已广泛应用于现实世界中的问题中。虽然无效的RL,作为一类有效的DRL方法,在端和范围内的状态进行策略学习时,与状态的策略学习相同,以持续的方式进行大规模的行动,并在范围内进行策略学习。但是,培训如此大的政策模型需要大量的轨迹样本和培训时间。另一方面,学习的政策通常未能推广到大规模的行动空间,尤其是对于连续的行动空间。为了解决这个问题,在本文中,我们提出了一种在潜在状态和行动空间中的有效政策学习方法。更具体地说,我们将国家表示的概念扩展到行动表示,以获得更好的政策概括能力。同时,我们以无监督的方式将整个学习任务分为学习,并以RL的方式通过小规模的政策模型进行学习。小政策模型促进了政策学习,同时并没有通过大型表示模型牺牲概括和表现力。最后,山车,载体和猎豹实验证明了所提出的方法的有效性。
Deep reinforcement learning (DRL) breaks through the bottlenecks of traditional reinforcement learning (RL) with the help of the perception capability of deep learning and has been widely applied in real-world problems.While model-free RL, as a class of efficient DRL methods, performs the learning of state representations simultaneously with policy learning in an end-to-end manner when facing large-scale continuous state and action spaces. However, training such a large policy model requires a large number of trajectory samples and training time. On the other hand, the learned policy often fails to generalize to large-scale action spaces, especially for the continuous action spaces. To address this issue, in this paper we propose an efficient policy learning method in latent state and action spaces. More specifically, we extend the idea of state representations to action representations for better policy generalization capability. Meanwhile, we divide the whole learning task into learning with the large-scale representation models in an unsupervised manner and learning with the small-scale policy model in the RL manner.The small policy model facilitates policy learning, while not sacrificing generalization and expressiveness via the large representation model. Finally,the effectiveness of the proposed method is demonstrated by MountainCar,CarRacing and Cheetah experiments.