论文标题
通过迭代本地搜索进行程序化策略提取
Programmatic Policy Extraction by Iterative Local Search
论文作者
论文摘要
强化学习政策通常由神经网络代表,但是在某些情况下,程序化政策是优选的,因为它们更容易解释,可以正式验证或更好地概括。尽管存在有效的学习神经政策算法,但学习计划政策具有挑战性。将模仿预测和数据集汇总与本地搜索启发式结合在一起,我们提出了一种简单而直接的方法,用于从预告片的神经政策中提取程序化策略。在检查了我们的本地搜索启发式有关示例问题的启发式启发式之后,我们演示了有关摆动问题的程序化策略提取方法。无论是使用手工制作的专家政策训练训练,还是学识渊博的神经政策,我们的方法都发现了简单且可解释的政策,这些政策几乎和原始政策一样。
Reinforcement learning policies are often represented by neural networks, but programmatic policies are preferred in some cases because they are more interpretable, amenable to formal verification, or generalize better. While efficient algorithms for learning neural policies exist, learning programmatic policies is challenging. Combining imitation-projection and dataset aggregation with a local search heuristic, we present a simple and direct approach to extracting a programmatic policy from a pretrained neural policy. After examining our local search heuristic on a programming by example problem, we demonstrate our programmatic policy extraction method on a pendulum swing-up problem. Both when trained using a hand crafted expert policy and a learned neural policy, our method discovers simple and interpretable policies that perform almost as well as the original.