通过在线增强学习的体现突触可塑性

论文标题

通过在线增强学习的体现突触可塑性

Embodied Synaptic Plasticity with Online Reinforcement learning

论文作者

Kaiser, Jacques, Hoff, Michael, Konle, Andreas, Tieck, J. Camilo Vasquez, Kappel, David, Reichard, Daniel, Subramoney, Anand, Legenstein, Robert, Roennau, Arne, Maass, Wolfgang, Dillmann, Rudiger

论文摘要

了解大脑的努力涉及多个协作的研究领域。从古典上讲，由理论神经科学家得出的突触可塑性规则在模式分类任务中孤立地评估。这与目的是控制闭环的身体的生物大脑形成对比。本文通过整合了这两个字段的开源软件组件来使计算神经科学和机器人技术的领域更加紧密地结合在一起。最终的框架允许评估闭环机器人环境中生物学上的可塑性模型的有效性。我们展示了通过在线增强学习（Spore）评估突触可塑性的框架，这是一项基于突触采样的奖励学习规则，在两个视觉运动任务上：到达和巷道。我们表明，孢子能够学习在模拟时间内完成这两个任务的策略。临时参数探索表明，需要调节控制突触学习动态的随机过程的学习率和温度需要保留绩效。最后，我们通过讨论最近的深入强化学习技术，这将有助于提高孢子在视觉运动任务上的功能。

The endeavor to understand the brain involves multiple collaborating research fields. Classically, synaptic plasticity rules derived by theoretical neuroscientists are evaluated in isolation on pattern classification tasks. This contrasts with the biological brain which purpose is to control a body in closed-loop. This paper contributes to bringing the fields of computational neuroscience and robotics closer together by integrating open-source software components from these two fields. The resulting framework allows to evaluate the validity of biologically-plausibe plasticity models in closed-loop robotics environments. We demonstrate this framework to evaluate Synaptic Plasticity with Online REinforcement learning (SPORE), a reward-learning rule based on synaptic sampling, on two visuomotor tasks: reaching and lane following. We show that SPORE is capable of learning to perform policies within the course of simulated hours for both tasks. Provisional parameter explorations indicate that the learning rate and the temperature driving the stochastic processes that govern synaptic learning dynamics need to be regulated for performance improvements to be retained. We conclude by discussing the recent deep reinforcement learning techniques which would be beneficial to increase the functionality of SPORE on visuomotor tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题