论文标题
使用样式转移的bootstrap状态表示,以更好地概括深度强化学习
Bootstrap State Representation using Style Transfer for Better Generalization in Deep Reinforcement Learning
论文作者
论文摘要
深度强化学习(RL)代理通常过于培养训练环境,从而导致概括效果不佳。在本文中,我们提出了一种自举方法,以无监督的方式从观察过程中消除混杂特征的对抗性效应,从而改善RL剂的概括。思想家的第一群集群将轨迹经历了几个集群。然后,通过应用样式传输生成器来引导这些轨迹,这将轨迹从一个集群的样式转换为另一个群集的样式,同时保持观测值的内容。然后将自举轨迹用于政策学习。思想家在许多RL设置中具有广泛的适用性。实验结果表明,与基本算法和几种数据增强技术相比,思想家可以在Procgen基准环境中提高概括能力。
Deep Reinforcement Learning (RL) agents often overfit the training environment, leading to poor generalization performance. In this paper, we propose Thinker, a bootstrapping method to remove adversarial effects of confounding features from the observation in an unsupervised way, and thus, it improves RL agents' generalization. Thinker first clusters experience trajectories into several clusters. These trajectories are then bootstrapped by applying a style transfer generator, which translates the trajectories from one cluster's style to another while maintaining the content of the observations. The bootstrapped trajectories are then used for policy learning. Thinker has wide applicability among many RL settings. Experimental results reveal that Thinker leads to better generalization capability in the Procgen benchmark environments compared to base algorithms and several data augmentation techniques.