元学习好奇心算法

论文标题

元学习好奇心算法

Meta-learning curiosity algorithms

论文作者

Alet, Ferran, Schneider, Martin F., Lozano-Perez, Tomas, Kaelbling, Leslie Pack

论文摘要

我们假设好奇心是进化的一种机制，它鼓励在代理人的生活中早期有意义的探索，以使其能够使其能够在其一生中获得高薪的经验。我们提出了产生好奇行为的问题之一：外循环将在好奇机制的空间中进行搜索，这些机制会动态调整代理的奖励信号，并且内部循环将使用改编的奖励信号执行标准的加固学习。但是，基于转移神经网络权重的当前元RL方法仅在非常相似的任务之间概括。为了扩大概括，我们建议对Meta-Learn算法：类似于ML论文中人类设计的代码片段。我们丰富的程序语言将神经网络与其他构建块相结合，例如缓冲区，最近的邻居模块和自定义损失功能。我们从经验上证明了该方法的有效性，发现了两种新奇的好奇算法，这些算法在PAR上执行或比人类设计的已发表的好奇心算法在领域中，就像带有图像输入，Acrobot，Lunar Lander，Ant和Hopper的网格导航一样截然不同。

We hypothesize that curiosity is a mechanism found by evolution that encourages meaningful exploration early in an agent's life in order to expose it to experiences that enable it to obtain high rewards over the course of its lifetime. We formulate the problem of generating curious behavior as one of meta-learning: an outer loop will search over a space of curiosity mechanisms that dynamically adapt the agent's reward signal, and an inner loop will perform standard reinforcement learning using the adapted reward signal. However, current meta-RL methods based on transferring neural network weights have only generalized between very similar tasks. To broaden the generalization, we instead propose to meta-learn algorithms: pieces of code similar to those designed by humans in ML papers. Our rich language of programs combines neural networks with other building blocks such as buffers, nearest-neighbor modules and custom loss functions. We demonstrate the effectiveness of the approach empirically, finding two novel curiosity algorithms that perform on par or better than human-designed published curiosity algorithms in domains as disparate as grid navigation with image inputs, acrobot, lunar lander, ant and hopper.

下载PDF全文

下载文献需遵守相关版权规定

论文标题