论文标题
时间自适应增强学习
Time Adaptive Reinforcement Learning
论文作者
论文摘要
强化学习(RL)允许解决复杂的任务,例如经常以比人类更强的表现。但是,学习的行为通常固定在特定的任务上,无法适应不同的上下文。在这里,我们考虑将RL代理适应不同时间限制的情况,例如,以给定的时间限制完成任务可能会从一个任务执行到下一个任务。我们定义了时间自适应马尔可夫决策过程等问题,并引入了两种无模型,基于价值的算法:独立的γ-填充和N步长集合。在与经典方法的不同之中,它们允许在不同的时间限制之间进行零拍的适应。所提出的方法代表处理时间自适应任务的一般机制,使其与许多现有的RL方法,算法和方案兼容。
Reinforcement learning (RL) allows to solve complex tasks such as Go often with a stronger performance than humans. However, the learned behaviors are usually fixed to specific tasks and unable to adapt to different contexts. Here we consider the case of adapting RL agents to different time restrictions, such as finishing a task with a given time limit that might change from one task execution to the next. We define such problems as Time Adaptive Markov Decision Processes and introduce two model-free, value-based algorithms: the Independent Gamma-Ensemble and the n-Step Ensemble. In difference to classical approaches, they allow a zero-shot adaptation between different time restrictions. The proposed approaches represent general mechanisms to handle time adaptive tasks making them compatible with many existing RL methods, algorithms, and scenarios.