论文标题
减轻持续微调的代表性转变
Alleviating Representational Shift for Continual Fine-tuning
论文作者
论文摘要
我们研究了持续学习的实际设置:不断对预训练的模型进行微调。先前的工作发现,在对新任务进行培训时,先前数据的功能(倒数第二层表示)将会改变,称为表示转移。除了功能的变化外,我们揭示了中间层的代表性转移(IRS)也很重要,因为它破坏了批处理的归一化,这是造成灾难性遗忘的另一个至关重要的原因。在此激励的情况下,我们提出了一种果料,一种微调方法,其中包含两个组件,即跨卷积批发归一化(XCONV BN)和分层微调。 XCONV BN维持卷积前的运行平均值而不是交叉后卷积,并在测试前恢复了卷卷后的卷卷,这纠正了IRS下的均值不准确的估计。分层微调利用多阶段的策略来微调预训练的网络,以防止Conv层的大规模变化,从而减轻IRS。四个数据集的实验结果表明,我们的方法明显优于几种最先进的方法,而较低的存储开销。
We study a practical setting of continual learning: fine-tuning on a pre-trained model continually. Previous work has found that, when training on new tasks, the features (penultimate layer representations) of previous data will change, called representational shift. Besides the shift of features, we reveal that the intermediate layers' representational shift (IRS) also matters since it disrupts batch normalization, which is another crucial cause of catastrophic forgetting. Motivated by this, we propose ConFiT, a fine-tuning method incorporating two components, cross-convolution batch normalization (Xconv BN) and hierarchical fine-tuning. Xconv BN maintains pre-convolution running means instead of post-convolution, and recovers post-convolution ones before testing, which corrects the inaccurate estimates of means under IRS. Hierarchical fine-tuning leverages a multi-stage strategy to fine-tune the pre-trained network, preventing massive changes in Conv layers and thus alleviating IRS. Experimental results on four datasets show that our method remarkably outperforms several state-of-the-art methods with lower storage overhead.