论文标题
任务不合时宜的持续强化学习:获得见解和克服挑战
Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming Challenges
论文作者
论文摘要
持续学习(CL)可以开发模型和代理,这些模型和代理从一系列任务中学习的同时解决标准深度学习方法的局限性,例如灾难性的遗忘。在这项工作中,我们研究了促进任务不合时宜CL和多任务(MTL)代理之间性能差异的因素。我们提出了两个假设:(1)任务不足的方法可能在数据,计算或高维度的设置中提供优势,并且(2)更快的适应性可能在连续学习环境中特别有益,有助于减轻灾难性遗忘的影响。为了研究这些假设,我们引入了一种基于重型的复发增强学习(3RL)方法,用于任务不合时宜的CL剂。我们评估3RL的合成任务和元世界基准,其中包括50个独特的操纵任务。我们的结果表明,3RL的表现优于基线方法,甚至可以超过其在具有高维度的挑战环境中的多任务等效物。我们还表明,经常性的任务不可能的代理始终优于其基于变压器对应的性能的性能。这些发现提供了对任务无关CL优于任务感知MTL方法的优势的见解,并强调了在资源受限,高维和多任务环境中任务无关方法的潜力。
Continual learning (CL) enables the development of models and agents that learn from a sequence of tasks while addressing the limitations of standard deep learning approaches, such as catastrophic forgetting. In this work, we investigate the factors that contribute to the performance differences between task-agnostic CL and multi-task (MTL) agents. We pose two hypotheses: (1) task-agnostic methods might provide advantages in settings with limited data, computation, or high dimensionality, and (2) faster adaptation may be particularly beneficial in continual learning settings, helping to mitigate the effects of catastrophic forgetting. To investigate these hypotheses, we introduce a replay-based recurrent reinforcement learning (3RL) methodology for task-agnostic CL agents. We assess 3RL on a synthetic task and the Meta-World benchmark, which includes 50 unique manipulation tasks. Our results demonstrate that 3RL outperforms baseline methods and can even surpass its multi-task equivalent in challenging settings with high dimensionality. We also show that the recurrent task-agnostic agent consistently outperforms or matches the performance of its transformer-based counterpart. These findings provide insights into the advantages of task-agnostic CL over task-aware MTL approaches and highlight the potential of task-agnostic methods in resource-constrained, high-dimensional, and multi-task environments.