渠道自学渠道用于在线知识蒸馏

论文标题

渠道自学渠道用于在线知识蒸馏

Channel Self-Supervision for Online Knowledge Distillation

论文作者

Fan, Shixiao, Cheng, Xuan, Wang, Xiaomin, Yang, Chun, Deng, Pan, Liu, Minghui, Deng, Jiali, Liu, Ming

论文摘要

最近，研究人员对在线知识蒸馏表现出越来越兴趣。在线知识蒸馏采用了一阶段和端到端的培训时尚，使用了多个同伴模型进行培训的聚合中间的预测。但是，缺乏强大的教师模型可能会导致小组同伴之间存在同质性问题，从而对小组蒸馏的有效性不利。在本文中，我们提出了一种新颖的在线知识蒸馏方法，\ textbf {c} hannel \ textbf {s} elf- \ textbf {s}在线知识蒸馏（CSS）的Upervision（CSS），该蒸馏（CSS）以输入，目标和网络来减轻均质化问题，以构建多样性。具体而言，我们构建了双网多分支结构，并通过自学学习的学习来增强分支间的多样性，采用特征级别的转换并增强相应的标签。同时，双网络结构具有更大的独立参数空间，可以抵抗蒸馏过程中的均质化问题。 CIFAR-100上的广泛定量实验表明，我们的方法比OKDDIP提供了更大的多样性，并且即使在PCL等最新的pcl上，我们也提供了相当大的性能。三个细粒数据集（Stanforddogs，StanfordCars，Cub-cub-200-211）的结果也显示了我们方法的显着概括能力。

Recently, researchers have shown an increased interest in the online knowledge distillation. Adopting an one-stage and end-to-end training fashion, online knowledge distillation uses aggregated intermediated predictions of multiple peer models for training. However, the absence of a powerful teacher model may result in the homogeneity problem between group peers, affecting the effectiveness of group distillation adversely. In this paper, we propose a novel online knowledge distillation method, \textbf{C}hannel \textbf{S}elf-\textbf{S}upervision for Online Knowledge Distillation (CSS), which structures diversity in terms of input, target, and network to alleviate the homogenization problem. Specifically, we construct a dual-network multi-branch structure and enhance inter-branch diversity through self-supervised learning, adopting the feature-level transformation and augmenting the corresponding labels. Meanwhile, the dual network structure has a larger space of independent parameters to resist the homogenization problem during distillation. Extensive quantitative experiments on CIFAR-100 illustrate that our method provides greater diversity than OKDDip and we also give pretty performance improvement, even over the state-of-the-art such as PCL. The results on three fine-grained datasets (StanfordDogs, StanfordCars, CUB-200-211) also show the significant generalization capability of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题