在增强学习中使用内存编辑学习用户定义的子目标

论文标题

在增强学习中使用内存编辑学习用户定义的子目标

Learning user-defined sub-goals using memory editing in reinforcement learning

论文作者

Lee, GyeongTaek

论文摘要

加固学习（RL）的目的是允许代理商实现最终目标。大多数RL研究的重点是提高学习效率，以更快地实现最终目标。但是，在达到最终目标的过程中，RL模型很难修改中间路线。也就是说，在现有研究中，代理无法控制其他子目标。如果代理可以在进入目的地的途中穿过子目标，则可以在各个领域应用和研究RL。在这项研究中，我提出了一种方法，以实现用户定义的子目标以及使用内存编辑的最终目标。进行内存编辑以生成各种子目标，并为代理提供额外的奖励。此外，从最终目标中分别学习了子目标。我在测试环境中设置了两个简单的环境和各种方案。结果，代理几乎成功地通过了子目标以及正在控制的最终目标。此外，能够在环境中间接访问新颖状态。我希望这种方法可以在需要在各种情况下控制代理的领域中使用。

The aim of reinforcement learning (RL) is to allow the agent to achieve the final goal. Most RL studies have focused on improving the efficiency of learning to achieve the final goal faster. However, the RL model is very difficult to modify an intermediate route in the process of reaching the final goal. That is, the agent cannot be under control to achieve other sub-goals in the existing studies. If the agent can go through the sub-goals on the way to the destination, the RL can be applied and studied in various fields. In this study, I propose a methodology to achieve the user-defined sub-goals as well as the final goal using memory editing. The memory editing is performed to generate various sub-goals and give an additional reward to the agent. In addition, the sub-goals are separately learned from the final goal. I set two simple environments and various scenarios in the test environments. As a result, the agent almost successfully passed the sub-goals as well as the final goal under control. Moreover, the agent was able to be induced to visit the novel state indirectly in the environments. I expect that this methodology can be used in the fields that need to control the agent in a variety of scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题