使用目标条件的强化学习的路径计划中完全可控制的代理

论文标题

使用目标条件的强化学习的路径计划中完全可控制的代理

A Fully Controllable Agent in the Path Planning using Goal-Conditioned Reinforcement Learning

论文作者

Lee, GyeongTaek

论文摘要

路径计划的目的是通过搜索代理的路线从起点实现目标。在路径计划中，路线可能会根据变量数量而有所不同，因此对于代理商而言，达到各种目标很重要。但是，大量研究已经处理了用户预定的一个目标。在本研究中，我为路径计划中完全可控制的代理提供了一个新颖的增强学习框架。为此，我提出了双向内存编辑，以获得代理的各种双向轨迹，在该轨迹中，对目标条件的RL进行了代理和子目标的行为。至于在各个方向上移动代理，我利用与策略网络分开的子目标专用网络。最后，我提出了奖励，以缩短代理商达到目标的步骤数。在实验结果中，代理商能够实现培训中代理商从未访问过的各种目标。我们确认代理商可以执行诸如往返之类的困难任务，而代理商则使用较短的路线和奖励成型。

The aim of path planning is to reach the goal from starting point by searching for the route of an agent. In the path planning, the routes may vary depending on the number of variables such that it is important for the agent to reach various goals. Numerous studies, however, have dealt with a single goal that is predefined by the user. In the present study, I propose a novel reinforcement learning framework for a fully controllable agent in the path planning. To do this, I propose a bi-directional memory editing to obtain various bi-directional trajectories of the agent, in which the behavior of the agent and sub-goals are trained on the goal-conditioned RL. As for moving the agent in various directions, I utilize the sub-goals dedicated network, separated from a policy network. Lastly, I present the reward shaping to shorten the number of steps for the agent to reach the goal. In the experimental result, the agent was able to reach the various goals that have never been visited by the agent in the training. We confirmed that the agent could perform difficult missions such as a round trip and the agent used the shorter route with the reward shaping.

下载PDF全文

下载文献需遵守相关版权规定

论文标题