鼠尾草：在深度强化学习中为近视模型产生象征目标

论文标题

鼠尾草：在深度强化学习中为近视模型产生象征目标

SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning

论文作者

Chester, Andrew, Dann, Michael, Zambetta, Fabio, Thangarajah, John

论文摘要

基于模型的增强学习算法通常比无模型的同行更有效率，尤其是在稀疏的奖励问题中。不幸的是，许多有趣的域太复杂了，无法指定传统基于模型的方法所需的完整模型。学习模型需要大量的环境样本，如果环境难以探索，可能不会捕获关键信息。如果我们可以指定一个不完整的模型并允许代理商学习如何最好地使用它，我们可以利用我们对许多域的部分理解。解决此问题的现有混合计划和学习系统通常对可以使用的模型施加了高度限制性的假设，从而将其适用性限制在广泛的域中。在这项工作中，我们提出了Sage，这是一种结合学习和计划来利用以前无法使用的不完整模型的算法。这结合了符号计划和神经学习方法的优势，以一种新颖的方式优于出租车和《我的世界》的变化方法。

Model-based reinforcement learning algorithms are typically more sample efficient than their model-free counterparts, especially in sparse reward problems. Unfortunately, many interesting domains are too complex to specify the complete models required by traditional model-based approaches. Learning a model takes a large number of environment samples, and may not capture critical information if the environment is hard to explore. If we could specify an incomplete model and allow the agent to learn how best to use it, we could take advantage of our partial understanding of many domains. Existing hybrid planning and learning systems which address this problem often impose highly restrictive assumptions on the sorts of models which can be used, limiting their applicability to a wide range of domains. In this work we propose SAGE, an algorithm combining learning and planning to exploit a previously unusable class of incomplete models. This combines the strengths of symbolic planning and neural learning approaches in a novel way that outperforms competing methods on variations of taxi world and Minecraft.

下载PDF全文

下载文献需遵守相关版权规定

论文标题