论文标题
空间:结构化压缩和持续学习的代表空间共享
SPACE: Structured Compression and Sharing of Representational Space for Continual Learning
论文作者
论文摘要
人类一生都可以自适应地学习。但是,逐渐学习的任务导致人工神经网络覆盖有关旧任务的相关信息,从而导致“灾难性遗忘”。克服这一现象的努力通常会通过发展网络体系结构或需要节省参数重要性得分或违反任务之间的数据隐私而进行的资源很差。为了解决这个问题,我们提出了空间,这是一种算法,它使网络能够通过将学习的空间划分为核心空间来不断有效地学习,该空间可以用作以前学到的任务的凝结知识基础,以及一个类似于学习当前任务的划痕空间。学习了每个任务后,对残差进行分析,以便在自身内部和学习的核心空间内进行冗余。解释当前任务所需的额外数量最少被添加到核心空间中,并且剩下的残差已被释放以学习下一个任务。我们在P-MNIST,CIFAR和8个不同数据集的序列上评估了我们的算法,并在克服灾难性遗忘的同时,获得了与最先进方法的可比精度。此外,我们的算法非常适合实际使用。分区算法一次分析所有层,以确保对更深的网络的可扩展性。此外,对尺寸的分析转化为滤波器级的稀疏性,而所得架构的结构化性质使我们在任务推断期间对当前最新技术的能力效率提高了5倍。
Humans learn adaptively and efficiently throughout their lives. However, incrementally learning tasks causes artificial neural networks to overwrite relevant information learned about older tasks, resulting in 'Catastrophic Forgetting'. Efforts to overcome this phenomenon often utilize resources poorly, for instance, by growing the network architecture or needing to save parametric importance scores, or violate data privacy between tasks. To tackle this, we propose SPACE, an algorithm that enables a network to learn continually and efficiently by partitioning the learnt space into a Core space, that serves as the condensed knowledge base over previously learned tasks, and a Residual space, which is akin to a scratch space for learning the current task. After learning each task, the Residual is analyzed for redundancy, both within itself and with the learnt Core space. A minimal number of extra dimensions required to explain the current task are added to the Core space and the remaining Residual is freed up for learning the next task. We evaluate our algorithm on P-MNIST, CIFAR and a sequence of 8 different datasets, and achieve comparable accuracy to the state-of-the-art methods while overcoming catastrophic forgetting. Additionally, our algorithm is well suited for practical use. The partitioning algorithm analyzes all layers in one shot, ensuring scalability to deeper networks. Moreover, the analysis of dimensions translates to filter-level sparsity, and the structured nature of the resulting architecture gives us up to 5x improvement in energy efficiency during task inference over the current state-of-the-art.