论文标题
超级叠加的叠加
Supermasks in Superposition
论文作者
论文摘要
我们在叠加(SUPSUP)模型中介绍了超级掩码,能够在没有灾难性遗忘的情况下依次学习数千个任务。我们的方法使用随机初始化的固定基础网络,每个任务都可以找到一个可以实现良好性能的子网(Supermask)。如果在测试时间给出任务身份,则可以使用最小的内存使用情况来检索正确的子网。如果未提供,SUPSUP可以使用基于梯度的优化来推断任务,以找到最大程度地减少输出熵的学习超级屏蔽的线性叠加。实际上,我们发现一个梯度步骤通常足以识别正确的面具,即使在2500个任务中也是如此。我们还展示了两个有希望的扩展。首先,可以在没有任务身份信息的情况下完全对SUPSUP模型进行培训,因为它们可能何时不确定新数据并为新培训分配分配其他超级邮件。最后,可以将整个不断增长的超级邮件存储在固定尺寸的Hopfield Network中,将它们留在恒定的储层中。
We present the Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting. Our approach uses a randomly initialized, fixed base network and for each task finds a subnetwork (supermask) that achieves good performance. If task identity is given at test time, the correct subnetwork can be retrieved with minimal memory usage. If not provided, SupSup can infer the task using gradient-based optimization to find a linear superposition of learned supermasks which minimizes the output entropy. In practice we find that a single gradient step is often sufficient to identify the correct mask, even among 2500 tasks. We also showcase two promising extensions. First, SupSup models can be trained entirely without task identity information, as they may detect when they are uncertain about new data and allocate an additional supermask for the new training distribution. Finally the entire, growing set of supermasks can be stored in a constant-sized reservoir by implicitly storing them as attractors in a fixed-sized Hopfield network.