论文标题
Neuro-Planner:一种基于神经形态增强学习的深度相机的3D视觉导航方法
Neuro-Planner: A 3D Visual Navigation Method for MAV with Depth Camera based on Neuromorphic Reinforcement Learning
论文作者
论文摘要
微型航空车辆(MAV)的传统视觉导航方法通常会根据以前的地图来计算满足约束的可通过路径。但是,这些方法存在诸如面对陌生环境的高需求和对计算资源的需求量高和鲁棒性。为了解决上述问题,我们提出了一种神经形态增强学习方法(Neuro-Planner),该方法结合了尖峰神经网络(SNN)和深度强化学习(DRL),以实现MAV 3D视觉导航和深度相机。具体而言,我们根据两态LIF(TS-LIF)神经元及其编码编码方案设计尖峰演员网络,以进行有效的推断。然后,我们改进的混合深层确定性政策梯度(HDDPG)和基于TS-LIF的时空背部传播(STBP)算法被用作参与者 - 批判网络体系结构的培训框架。为了验证所提出的神经播种器的有效性,我们在软件中的仿真框架(SITL)仿真框架中与各种SNN训练算法(STBP,BPTT和Slayer)进行了详细的比较实验。在两个评估环境中,我们的HDDPG-STBP的导航成功率为4.3 \%和5.3 \%。据我们所知,这是结合MAV 3D视觉导航任务的神经形态计算和深度强化学习的第一项工作。
Traditional visual navigation methods of micro aerial vehicle (MAV) usually calculate a passable path that satisfies the constraints depending on a prior map. However, these methods have issues such as high demand for computing resources and poor robustness in face of unfamiliar environments. Aiming to solve the above problems, we propose a neuromorphic reinforcement learning method (Neuro-Planner) that combines spiking neural network (SNN) and deep reinforcement learning (DRL) to realize MAV 3D visual navigation with depth camera. Specifically, we design spiking actor network based on two-state LIF (TS-LIF) neurons and its encoding-decoding schemes for efficient inference. Then our improved hybrid deep deterministic policy gradient (HDDPG) and TS-LIF-based spatio-temporal back propagation (STBP) algorithms are used as the training framework for actor-critic network architecture. To verify the effectiveness of the proposed Neuro-Planner, we carry out detailed comparison experiments with various SNN training algorithm (STBP, BPTT and SLAYER) in the software-in-the-loop (SITL) simulation framework. The navigation success rate of our HDDPG-STBP is 4.3\% and 5.3\% higher than that of the original DDPG in the two evaluation environments. To the best of our knowledge, this is the first work combining neuromorphic computing and deep reinforcement learning for MAV 3D visual navigation task.