论文标题
动作调节的中脑多巴胺活性来自分布式控制策略
Action-modulated midbrain dopamine activity arises from distributed control policies
论文作者
论文摘要
动物行为是由多个与不同控制策略并行工作的大脑区域驱动的。我们提出了一种在基底神经节中的生物学上合理的非政策增强学习模型,该模型可以在这种建筑中学习。该模型解释了与动作相关的多巴胺活动的调制,这些调制不是由以前实现派利算法的模型捕获的。特别是,该模型预测,多巴胺活动标志着奖励预测误差(如经典模型)和“动作惊喜”的组合,这是对动作相对于基础神经化的当前政策的意外程度的衡量标准。在动作惊喜项的存在下,该模型实现了Q学习的近似形式。在基准导航和达到任务上,我们从经验上表明,该模型能够完全或部分由其他策略(例如其他大脑区域)学习。相比之下,没有动作惊喜术语的模型在存在其他政策的情况下遭受了损失,并且根本无法从完全由外部驱动的行为中学习。该模型为多巴胺活性提供了许多实验发现,提供了一个计算说明,这是基础神经节中的经典增强模型无法解释的。这些包括背侧和腹侧纹状体中不同水平的动作惊喜信号,通过实践减少运动调节的多巴胺活性的量以及多巴胺活性的动作起始和运动学的表示。它还提供了可以通过纹状体多巴胺活性的记录来测试的进一步预测。
Animal behavior is driven by multiple brain regions working in parallel with distinct control policies. We present a biologically plausible model of off-policy reinforcement learning in the basal ganglia, which enables learning in such an architecture. The model accounts for action-related modulation of dopamine activity that is not captured by previous models that implement on-policy algorithms. In particular, the model predicts that dopamine activity signals a combination of reward prediction error (as in classic models) and "action surprise," a measure of how unexpected an action is relative to the basal ganglia's current policy. In the presence of the action surprise term, the model implements an approximate form of Q-learning. On benchmark navigation and reaching tasks, we show empirically that this model is capable of learning from data driven completely or in part by other policies (e.g. from other brain regions). By contrast, models without the action surprise term suffer in the presence of additional policies, and are incapable of learning at all from behavior that is completely externally driven. The model provides a computational account for numerous experimental findings about dopamine activity that cannot be explained by classic models of reinforcement learning in the basal ganglia. These include differing levels of action surprise signals in dorsal and ventral striatum, decreasing amounts movement-modulated dopamine activity with practice, and representations of action initiation and kinematics in dopamine activity. It also provides further predictions that can be tested with recordings of striatal dopamine activity.