分层加强学习作为人类任务交错的模型

论文标题

分层加强学习作为人类任务交错的模型

Hierarchical Reinforcement Learning as a Model of Human Task Interleaving

论文作者

Gebhardt, Christoph, Oulasvirta, Antti, Hilliges, Otmar

论文摘要

人们如何决定在任务中继续多长时间，何时切换以及其他任务？了解基础任务交织的机制是认知科学的长期目标。先前的工作表明贪婪的启发式方法和最大化边际收益率的政策。但是，目前尚不清楚这种策略如何允许适应日常环境，这些环境提供了具有复杂开关成本和延迟奖励的多个任务。在这里，我们开发了由强化学习（RL）驱动的监督控制的层次结构模型。监督级别学会使用特定于任务的近似实用程序估计值进行切换，这些估计是在较低级别计算的。即使在具有多个任务，任意和不确定的奖励和成本结构的条件下，也可以从经验中学到层次上最佳的价值函数分解。该模型再现了任务交织的已知经验效应。与六任务问题中的近视基线相比，它可以更好地预测单个级别数据（n = 211）。结果支持层次RL作为任务交织的合理模型。

How do people decide how long to continue in a task, when to switch, and to which other task? Understanding the mechanisms that underpin task interleaving is a long-standing goal in the cognitive sciences. Prior work suggests greedy heuristics and a policy maximizing the marginal rate of return. However, it is unclear how such a strategy would allow for adaptation to everyday environments that offer multiple tasks with complex switch costs and delayed rewards. Here we develop a hierarchical model of supervisory control driven by reinforcement learning (RL). The supervisory level learns to switch using task-specific approximate utility estimates, which are computed on the lower level. A hierarchically optimal value function decomposition can be learned from experience, even in conditions with multiple tasks and arbitrary and uncertain reward and cost structures. The model reproduces known empirical effects of task interleaving. It yields better predictions of individual-level data than a myopic baseline in a six-task problem (N=211). The results support hierarchical RL as a plausible model of task interleaving.

下载PDF全文

下载文献需遵守相关版权规定

论文标题