论文标题
学习用于自动驾驶的可解释的高性能政策
Learning Interpretable, High-Performing Policies for Autonomous Driving
论文作者
论文摘要
基于梯度的增强学习方法(RL)在学习自动驾驶汽车的学习政策方面取得了巨大的成功。尽管这些方法的性能值得现实世界中的采用,但这些政策缺乏解释性,限制了自动驾驶(AD)的安全关键和法律调节领域的可部署性。广告需要可解释和可验证的控制政策,以保持高性能。我们提出了可解释的连续控制树(ICCTS),这是一种基于树的模型,可以通过现代的,基于梯度的RL方法进行优化,以产生高性能,可解释的策略。我们方法的关键是允许在稀疏决策-tre-like表示中进行直接优化的过程。我们在跨六个域中对基地的ICCT验证,这表明ICCT能够学习可解释的策略表示,在AD场景中,在AD场景中均高达33%,同时降低了300倍-600倍的策略参数与深度学习基线的差异。此外,我们通过14辆车的物理机器人演示证明了ICCT的解释性和实用性。
Gradient-based approaches in reinforcement learning (RL) have achieved tremendous success in learning policies for autonomous vehicles. While the performance of these approaches warrants real-world adoption, these policies lack interpretability, limiting deployability in the safety-critical and legally-regulated domain of autonomous driving (AD). AD requires interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33% in AD scenarios while achieving a 300x-600x reduction in the number of policy parameters against deep learning baselines. Furthermore, we demonstrate the interpretability and utility of our ICCTs through a 14-car physical robot demonstration.