多件事神经进化，用于长期和短情节的增强学习

论文标题

多件事神经进化，用于长期和短情节的增强学习

Multitask Neuroevolution for Reinforcement Learning with Long and Short Episodes

论文作者

Zhang, Nick, Gupta, Abhishek, Chen, Zefeng, Ong, Yew-Soon

论文摘要

研究表明，进化策略（ES）是具有深层神经网络的增强学习（RL）的有前途的方法。但是，高样本复杂性的问题仍然存在于ES对远距离深度RL的应用中。本文是第一个通过新型的神经进化多任务处理（NUEMT）算法解决当今方法的缺点，该算法旨在将信息从一组（短情节长度）转移到目标（全长）RL任务的辅助任务。从目标中提取的辅助任务允许代理商更新并快速评估较短时间范围的策略。然后转移进化的技能，以指导更长，更艰巨的任务实现最佳政策。我们证明，NUEMT算法达到了数据有效的进化RL，从而减少了昂贵的代理 - 环境交互数据要求。在这种情况下，我们的主要算法贡献是首次基于统计重要性采样技术引入多任务技能转移机制。此外，利用自适应资源分配策略将计算资源分配给基于其收集的实用性的辅助任务。关于OpenAI体育馆的一系列连续控制任务的实验证实，与最近的ES基线相比，我们提出的算法有效。

Studies have shown evolution strategies (ES) to be a promising approach for reinforcement learning (RL) with deep neural networks. However, the issue of high sample complexity persists in applications of ES to deep RL over long horizons. This paper is the first to address the shortcoming of today's methods via a novel neuroevolutionary multitasking (NuEMT) algorithm, designed to transfer information from a set of auxiliary tasks (of short episode length) to the target (full length) RL task at hand. The auxiliary tasks, extracted from the target, allow an agent to update and quickly evaluate policies on shorter time horizons. The evolved skills are then transferred to guide the longer and harder task towards an optimal policy. We demonstrate that the NuEMT algorithm achieves data-efficient evolutionary RL, reducing expensive agent-environment interaction data requirements. Our key algorithmic contribution in this setting is to introduce, for the first time, a multitask skills transfer mechanism based on the statistical importance sampling technique. In addition, an adaptive resource allocation strategy is utilized to assign computational resources to auxiliary tasks based on their gleaned usefulness. Experiments on a range of continuous control tasks from the OpenAI Gym confirm that our proposed algorithm is efficient compared to recent ES baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题