了解深度强化学习中线性区域的演变

论文标题

了解深度强化学习中线性区域的演变

Understanding the Evolution of Linear Regions in Deep Reinforcement Learning

论文作者

Cohan, Setareh, Kim, Nam Hee, Rolnick, David, van de Panne, Michiel

论文摘要

深厚的增强学习产生的政策通常以其学习曲线为特征，但在许多其他方面，它们仍然对他们的理解仍然很差。基于RELU的策略导致将输入空间分配到分段线性区域中。我们试图了解观察到的区域数量及其密度如何在深度强化学习过程中使用经验结果跨越一系列连续控制任务和策略网络维度。从直觉上讲，我们可能希望在培训期间，该政策经常访问的地区的区域密度会增加，从而提供细粒度的控制。我们使用神经网络在监督学习环境中引起的线性区域的最新理论和经验结果，以对我们的结果进行接地和比较。从经验上讲，我们发现该区域密度仅在整个训练过程中都适度增加，正如最终政策所遵循的固定轨迹所测量的那样。但是，轨迹本身在训练过程中的长度也增加，因此从当前轨迹的角度看，区域密度降低。我们的发现表明，深度强化学习政策的复杂性并不能主要来自于在政策的轨迹上观察到的功能的复杂性的显着增长。

Policies produced by deep reinforcement learning are typically characterised by their learning curves, but they remain poorly understood in many other respects. ReLU-based policies result in a partitioning of the input space into piecewise linear regions. We seek to understand how observed region counts and their densities evolve during deep reinforcement learning using empirical results that span a range of continuous control tasks and policy network dimensions. Intuitively, we may expect that during training, the region density increases in the areas that are frequently visited by the policy, thereby affording fine-grained control. We use recent theoretical and empirical results for the linear regions induced by neural networks in supervised learning settings for grounding and comparison of our results. Empirically, we find that the region density increases only moderately throughout training, as measured along fixed trajectories coming from the final policy. However, the trajectories themselves also increase in length during training, and thus the region densities decrease as seen from the perspective of the current trajectory. Our findings suggest that the complexity of deep reinforcement learning policies does not principally emerge from a significant growth in the complexity of functions observed on-and-around trajectories of the policy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题