深层政策的目标条件发电机

论文标题

深层政策的目标条件发电机

Goal-Conditioned Generators of Deep Policies

论文作者

Faccio, Francesco, Herrmann, Vincent, Ramesh, Aditya, Kirsch, Louis, Schmidhuber, Jürgen

论文摘要

鉴于在特殊命令输入中编码的目标，目标条件的增强学习（RL）旨在学习最佳政策。在这里，我们研究了目标有针对性的神经网（NNS），这些神经网已经学会以特定于上下文特定的重量矩阵形式生成深度NN策略，类似于快速重量程序员和1990年代的其他方法。使用表单的上下文命令“生成实现预期回报的策略”，我们的NN Generator将参数空间的强大探索与跨命令的概括相结合，以迭代地找到越来越更好的策略。一种体重分享的超级核武器和策略嵌入形式缩放了我们生成深度NN的方法。实验表明，单个学识渊博的政策生成器如何制定在培训期间获得任何回报的政策。最后，我们在表现出竞争性能的一系列连续控制任务上评估了算法。我们的代码是公开的。

Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form "generate a policy that achieves a desired expected return," our NN generators combine powerful exploration of parameter space with generalization across commands to iteratively find better and better policies. A form of weight-sharing HyperNetworks and policy embeddings scales our method to generate deep NNs. Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance. Our code is public.

下载PDF全文

下载文献需遵守相关版权规定

论文标题