基于网格的低分形维度增强学习政策的分析

论文标题

基于网格的低分形维度增强学习政策的分析

Mesh Based Analysis of Low Fractal Dimension Reinforcement Learning Policies

论文作者

Gillen, Sean, Byl, Katie

论文摘要

在以前的工作中，使用我们称之为网格的过程，将各种连续和混合系统的可及状态空间近似为离散的一组状态，然后可以将其合成到马尔可夫链中。这种方法的应用之一是分析通过增强学习获得的运动策略，旨在为所得系统的稳定性提供经验保证。在另一项研究中，我们为使用推出轨迹的“分形维度”引入了对政策增强学习算法的修改奖励功能。该奖励被证明是为了鼓励诱发单个轨迹的政策，这些政策可以更紧凑地表示为离散网格。在这项工作中，我们通过建立受干扰的系统可到达状态空间的网格来结合这两个研究线程，并受到修改后的奖励获得的政策控制。我们的分析表明，修改后的政策确实会产生更小的可覆盖网格。这表明，经过分形维度奖励的训练的代理转移了他们的理想质量，即具有更紧凑的状态空间到具有外部干扰的设置。结果还表明，先前使用基于网格的工具来分析RL策略的工作可能扩展到更高的维度系统或更高的分辨率网格。

In previous work, using a process we call meshing, the reachable state spaces for various continuous and hybrid systems were approximated as a discrete set of states which can then be synthesized into a Markov chain. One of the applications for this approach has been to analyze locomotion policies obtained by reinforcement learning, in a step towards making empirical guarantees about the stability properties of the resulting system. In a separate line of research, we introduced a modified reward function for on-policy reinforcement learning algorithms that utilizes a "fractal dimension" of rollout trajectories. This reward was shown to encourage policies that induce individual trajectories which can be more compactly represented as a discrete mesh. In this work we combine these two threads of research by building meshes of the reachable state space of a system subject to disturbances and controlled by policies obtained with the modified reward. Our analysis shows that the modified policies do produce much smaller reachable meshes. This shows that agents trained with the fractal dimension reward transfer their desirable quality of having a more compact state space to a setting with external disturbances. The results also suggest that the previous work using mesh based tools to analyze RL policies may be extended to higher dimensional systems or to higher resolution meshes than would have otherwise been possible.

下载PDF全文

下载文献需遵守相关版权规定

论文标题