学习用于自动驾驶的可解释的高性能政策

论文标题

学习用于自动驾驶的可解释的高性能政策

Learning Interpretable, High-Performing Policies for Autonomous Driving

论文作者

Paleja, Rohan, Niu, Yaru, Silva, Andrew, Ritchie, Chace, Choi, Sugju, Gombolay, Matthew

论文摘要

基于梯度的增强学习方法（RL）在学习自动驾驶汽车的学习政策方面取得了巨大的成功。尽管这些方法的性能值得现实世界中的采用，但这些政策缺乏解释性，限制了自动驾驶（AD）的安全关键和法律调节领域的可部署性。广告需要可解释和可验证的控制政策，以保持高性能。我们提出了可解释的连续控制树（ICCTS），这是一种基于树的模型，可以通过现代的，基于梯度的RL方法进行优化，以产生高性能，可解释的策略。我们方法的关键是允许在稀疏决策-tre-like表示中进行直接优化的过程。我们在跨六个域中对基地的ICCT验证，这表明ICCT能够学习可解释的策略表示，在AD场景中，在AD场景中均高达33％，同时降低了300倍-600倍的策略参数与深度学习基线的差异。此外，我们通过14辆车的物理机器人演示证明了ICCT的解释性和实用性。

Gradient-based approaches in reinforcement learning (RL) have achieved tremendous success in learning policies for autonomous vehicles. While the performance of these approaches warrants real-world adoption, these policies lack interpretability, limiting deployability in the safety-critical and legally-regulated domain of autonomous driving (AD). AD requires interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33% in AD scenarios while achieving a 300x-600x reduction in the number of policy parameters against deep learning baselines. Furthermore, we demonstrate the interpretability and utility of our ICCTs through a 14-car physical robot demonstration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题