论文标题
印象深:通过联合任务印象不断学习
ImpressLearn: Continual Learning via Combined Task Impressions
论文作者
论文摘要
这项工作提出了一种新的方法,可以在多个任务上依次训练深层神经网络,而不会遭受灾难性的遗忘,同时赋予了它快速适应未见任务的能力。从现有的网络掩蔽工作(Wortsman等,2020)开始,我们表明,仅学习在随机初始化的骨干网络上的少量特定任务超级邮件(印象)的线性组合,足以使以前学到的任务保持准确性,并在不可能的任务上实现高准确性。与以前的方法相反,我们不需要为每个新任务生成专用面具或上下文,而是利用转移学习来保持每次任务参数的高间接费用。我们的工作说明了线性结合单个印象的力量,每种印象都孤立地进行了差异,以实现与专用面具相当的性能。此外,即使是相同任务(均匀面罩)的重复印象,如果使用足够多的印象,则可以接近异质组合的性能。我们的方法比现有方法更有效地缩放,通常需要较少的参数订单,即使丢失了任务身份,也可以在不修改的情况下起作用。此外,在推理时未给出任务标签的情况下,我们的算法给出了Wortsman等人2020年使用的单发过程的一种有利的替代方法。我们在许多知名的图像分类数据集和网络架构上评估了我们的方法。
This work proposes a new method to sequentially train deep neural networks on multiple tasks without suffering catastrophic forgetting, while endowing it with the capability to quickly adapt to unseen tasks. Starting from existing work on network masking (Wortsman et al., 2020), we show that simply learning a linear combination of a small number of task-specific supermasks (impressions) on a randomly initialized backbone network is sufficient to both retain accuracy on previously learned tasks, as well as achieve high accuracy on unseen tasks. In contrast to previous methods, we do not require to generate dedicated masks or contexts for each new task, instead leveraging transfer learning to keep per-task parameter overhead small. Our work illustrates the power of linearly combining individual impressions, each of which fares poorly in isolation, to achieve performance comparable to a dedicated mask. Moreover, even repeated impressions from the same task (homogeneous masks), when combined, can approach the performance of heterogeneous combinations if sufficiently many impressions are used. Our approach scales more efficiently than existing methods, often requiring orders of magnitude fewer parameters and can function without modification even when task identity is missing. In addition, in the setting where task labels are not given at inference, our algorithm gives an often favorable alternative to the one-shot procedure used by Wortsman et al., 2020. We evaluate our method on a number of well-known image classification datasets and network architectures.