论文标题
渠道自学渠道用于在线知识蒸馏
Channel Self-Supervision for Online Knowledge Distillation
论文作者
论文摘要
最近,研究人员对在线知识蒸馏表现出越来越兴趣。在线知识蒸馏采用了一阶段和端到端的培训时尚,使用了多个同伴模型进行培训的聚合中间的预测。但是,缺乏强大的教师模型可能会导致小组同伴之间存在同质性问题,从而对小组蒸馏的有效性不利。在本文中,我们提出了一种新颖的在线知识蒸馏方法,\ textbf {c} hannel \ textbf {s} elf- \ textbf {s}在线知识蒸馏(CSS)的Upervision(CSS),该蒸馏(CSS)以输入,目标和网络来减轻均质化问题,以构建多样性。具体而言,我们构建了双网多分支结构,并通过自学学习的学习来增强分支间的多样性,采用特征级别的转换并增强相应的标签。同时,双网络结构具有更大的独立参数空间,可以抵抗蒸馏过程中的均质化问题。 CIFAR-100上的广泛定量实验表明,我们的方法比OKDDIP提供了更大的多样性,并且即使在PCL等最新的pcl上,我们也提供了相当大的性能。三个细粒数据集(Stanforddogs,StanfordCars,Cub-cub-200-211)的结果也显示了我们方法的显着概括能力。
Recently, researchers have shown an increased interest in the online knowledge distillation. Adopting an one-stage and end-to-end training fashion, online knowledge distillation uses aggregated intermediated predictions of multiple peer models for training. However, the absence of a powerful teacher model may result in the homogeneity problem between group peers, affecting the effectiveness of group distillation adversely. In this paper, we propose a novel online knowledge distillation method, \textbf{C}hannel \textbf{S}elf-\textbf{S}upervision for Online Knowledge Distillation (CSS), which structures diversity in terms of input, target, and network to alleviate the homogenization problem. Specifically, we construct a dual-network multi-branch structure and enhance inter-branch diversity through self-supervised learning, adopting the feature-level transformation and augmenting the corresponding labels. Meanwhile, the dual network structure has a larger space of independent parameters to resist the homogenization problem during distillation. Extensive quantitative experiments on CIFAR-100 illustrate that our method provides greater diversity than OKDDip and we also give pretty performance improvement, even over the state-of-the-art such as PCL. The results on three fine-grained datasets (StanfordDogs, StanfordCars, CUB-200-211) also show the significant generalization capability of our approach.