分解的知识蒸馏班级语义细分

论文标题

分解的知识蒸馏班级语义细分

Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation

论文作者

Baek, Donghyeon, Oh, Youngmin, Lee, Sanghoon, Lee, Junghyup, Ham, Bumsub

论文摘要

班级信息语义分割（CISS）将图像的每个像素不断地标记为图像的每个像素，并连续使用相应的对象/物品类标记。为此，在不忘记以前学习的知识的情况下逐步学习新颖的课程至关重要。当前的CISS方法通常使用知识蒸馏（KD）技术来保存分类器逻辑或冻结功能提取器，以避免遗忘问题。然而，强大的限制可以防止学习新课程的学习歧视特征。我们介绍了一个CISS框架，以减轻遗忘问题并有效地学习新颖的课程。我们发现可以将logit分解为两个术语。他们量化了输入是否属于特定类的可能性，为模型推理过程提供了线索。在这种情况下，KD技术保留了两个术语的总和（即类logit），这表明每个术语可以更改，因此KD不会模仿推理过程。要明确对每个术语施加约束，我们提出了一种新的分解知识蒸馏（DKD）技术，提高了模型的刚度并更有效地解决了遗忘问题。我们还介绍了一种新颖的初始化方法，以训练新的分类器进行新型类别。在CISS中，新型类别的负训练样本的数量不足以区分旧类。为了减轻这种情况，我们建议使用辅助分类器将负面因素的知识转移到分类器中，从而显着提高性能。标准CISS基准的实验结果证明了我们框架的有效性。

Class-incremental semantic segmentation (CISS) labels each pixel of an image with a corresponding object/stuff class continually. To this end, it is crucial to learn novel classes incrementally without forgetting previously learned knowledge. Current CISS methods typically use a knowledge distillation (KD) technique for preserving classifier logits, or freeze a feature extractor, to avoid the forgetting problem. The strong constraints, however, prevent learning discriminative features for novel classes. We introduce a CISS framework that alleviates the forgetting problem and facilitates learning novel classes effectively. We have found that a logit can be decomposed into two terms. They quantify how likely an input belongs to a particular class or not, providing a clue for a reasoning process of a model. The KD technique, in this context, preserves the sum of two terms (i.e., a class logit), suggesting that each could be changed and thus the KD does not imitate the reasoning process. To impose constraints on each term explicitly, we propose a new decomposed knowledge distillation (DKD) technique, improving the rigidity of a model and addressing the forgetting problem more effectively. We also introduce a novel initialization method to train new classifiers for novel classes. In CISS, the number of negative training samples for novel classes is not sufficient to discriminate old classes. To mitigate this, we propose to transfer knowledge of negatives to the classifiers successively using an auxiliary classifier, boosting the performance significantly. Experimental results on standard CISS benchmarks demonstrate the effectiveness of our framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题