通过奖励序列分布的特征功能学习与任务相关的代表

论文标题

通过奖励序列分布的特征功能学习与任务相关的代表

Learning Task-relevant Representations for Generalization via Characteristic Functions of Reward Sequence Distributions

论文作者

Yang, Rui, Wang, Jie, Geng, Zijie, Ye, Mingxuan, Ji, Shuiwang, Li, Bin, Wu, Feng

论文摘要

具有相同任务的不同环境的概括对于在实际场景中成功应用视觉增强学习（RL）至关重要。但是，从高维观察中，视觉干扰（在真实场景中很常见）可能会对视觉RL中学习的表示形式有害，从而降低了概括的性能。为了解决这个问题，我们提出了一种新颖的方法，即特征奖励序列预测（Cresp），以通过学习奖励序列分布（RSD）提取与任务相关的信息，因为奖励信号在RL中与任务相关，并且不变，并且可以视觉分散分散。具体而言，要通过RSD有效捕获与任务相关的信息，Cresp引入了辅助任务（即预测RSD的特征功能），以学习与任务相关的表示形式，因为我们可以通过利用相应的特征函数来近似高维分布。实验表明，Cresp显着提高了在看不见的环境上的泛化性能，表现优于不同视觉分散注意力的DeepMind控制任务的几个最新。

Generalization across different environments with the same tasks is critical for successful applications of visual reinforcement learning (RL) in real scenarios. However, visual distractions -- which are common in real scenes -- from high-dimensional observations can be hurtful to the learned representations in visual RL, thus degrading the performance of generalization. To tackle this problem, we propose a novel approach, namely Characteristic Reward Sequence Prediction (CRESP), to extract the task-relevant information by learning reward sequence distributions (RSDs), as the reward signals are task-relevant in RL and invariant to visual distractions. Specifically, to effectively capture the task-relevant information via RSDs, CRESP introduces an auxiliary task -- that is, predicting the characteristic functions of RSDs -- to learn task-relevant representations, because we can well approximate the high-dimensional distributions by leveraging the corresponding characteristic functions. Experiments demonstrate that CRESP significantly improves the performance of generalization on unseen environments, outperforming several state-of-the-arts on DeepMind Control tasks with different visual distractions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题