论文标题
可到达的多面体游行(RPM):深度学习系统的精确分析工具
Reachable Polyhedral Marching (RPM): An Exact Analysis Tool for Deep-Learned Control Systems
论文作者
论文摘要
神经网络越来越多地用作策略,状态过渡模型,状态估计模型或上述所有内容。通过从数据中学到这些组件,重要的是要能够分析学习哪些行为以及这如何影响闭环性能。在本文中,我们通过开发用于计算以神经网络的动态系统的控制不变集和吸引力(ROA)的方法来朝着这一目标采取步骤。我们将注意力集中在馈电神经网络上,具有整流的线性单元(RELU)激活,已知可以实施连续的分段效果(PWA)功能。我们描述了可涉及的多面体游行(RPM)算法,用于通过增量连接的步行来列举神经网络的仿期。然后,我们使用此算法来计算准确的前进和向后可及的集合,从中我们提供了计算控制不变集和ROA的方法。我们的方法是独一无二的,因为我们在没有基于Lyapunov的工具的情况下会逐步发现这些集合。在我们的示例中,我们证明了我们的方法可以通过学习的范德尔波尔振荡器和摆模型在任务上找到非凸控制不变集和ROA的能力。此外,我们为计算ROA提供了一种加速算法,该算法利用了RPM提供的仿射区域的增量和连接枚举。我们在示例中显示了这种加速度,导致15倍加速。最后,我们应用方法来找到一组由基于图像的控制器稳定的状态,用于飞机跑道控制问题。
Neural networks are increasingly used in robotics as policies, state transition models, state estimation models, or all of the above. With these components being learned from data, it is important to be able to analyze what behaviors were learned and how this affects closed-loop performance. In this paper we take steps toward this goal by developing methods for computing control invariant sets and regions of attraction (ROAs) of dynamical systems represented as neural networks. We focus our attention on feedforward neural networks with the rectified linear unit (ReLU) activation, which are known to implement continuous piecewise-affine (PWA) functions. We describe the Reachable Polyhedral Marching (RPM) algorithm for enumerating the affine pieces of a neural network through an incremental connected walk. We then use this algorithm to compute exact forward and backward reachable sets, from which we provide methods for computing control invariant sets and ROAs. Our approach is unique in that we find these sets incrementally, without Lyapunov-based tools. In our examples we demonstrate the ability of our approach to find non-convex control invariant sets and ROAs on tasks with learned van der Pol oscillator and pendulum models. Further, we provide an accelerated algorithm for computing ROAs that leverages the incremental and connected enumeration of affine regions that RPM provides. We show this acceleration to lead to a 15x speedup in our examples. Finally, we apply our methods to find a set of states that are stabilized by an image-based controller for an aircraft runway control problem.