论文标题
为自动导航的基准测试加固学习技术
Benchmarking Reinforcement Learning Techniques for Autonomous Navigation
论文作者
论文摘要
深度强化学习(RL)为自动机器人导航带来了许多成功。但是,仍然存在重要的局限性,以防止现实使用基于RL的导航系统。例如,大多数学习方法都缺乏安全保证。学习的导航系统可能无法很好地概括到看不见的环境。尽管通常有各种各样的学习技术来应对这些挑战,但缺乏专门用于自主导航的开源基准和可再现的学习方法,使机器人主义者很难选择用于其移动机器人使用的学习方法,以及学习研究人员无法识别当前一般学习方法的自动导航方法。在本文中,我们确定了将深入的RL方法应用于自动导航的四个主要逃避:(D1)在不确定性,(D2)安全性,(D3)从有限的试验和错误数据中学习以及(D4)(D4)对多样性和新型环境进行概括。然后,我们探索了四个主要的学习技术类别,目的是实现四个Desiderata中的一个或多个:基于内存的神经网络体系结构(D1),Safe RL(D2),基于模型的RL(D2,D3)和域随机化(D4)。通过将这些学习技术部署在新的开源大规模导航基准和现实环境中,我们进行了一项全面的研究,旨在确定这些技术能够在多大程度上为基于RL的导航系统实现这些Desiderata。
Deep reinforcement learning (RL) has brought many successes for autonomous robot navigation. However, there still exists important limitations that prevent real-world use of RL-based navigation systems. For example, most learning approaches lack safety guarantees; and learned navigation systems may not generalize well to unseen environments. Despite a variety of recent learning techniques to tackle these challenges in general, a lack of an open-source benchmark and reproducible learning methods specifically for autonomous navigation makes it difficult for roboticists to choose what learning methods to use for their mobile robots and for learning researchers to identify current shortcomings of general learning methods for autonomous navigation. In this paper, we identify four major desiderata of applying deep RL approaches for autonomous navigation: (D1) reasoning under uncertainty, (D2) safety, (D3) learning from limited trial-and-error data, and (D4) generalization to diverse and novel environments. Then, we explore four major classes of learning techniques with the purpose of achieving one or more of the four desiderata: memory-based neural network architectures (D1), safe RL (D2), model-based RL (D2, D3), and domain randomization (D4). By deploying these learning techniques in a new open-source large-scale navigation benchmark and real-world environments, we perform a comprehensive study aimed at establishing to what extent can these techniques achieve these desiderata for RL-based navigation systems.