论文标题
在自我缩减的图形神经网络上
On Self-Distilling Graph Neural Network
论文作者
论文摘要
最近,教师知识蒸馏框架表明了其在训练图神经网络(GNNS)中的潜力。但是,由于难以训练过度参数的GNN模型,因此可能不容易获得令人满意的教师模型进行蒸馏。此外,教师知识蒸馏的效率低下培训过程也阻碍了其在GNN模型中的应用。在本文中,我们提出了第一种用于GNNS的无教师知识蒸馏方法,称为GNN自distillation(GNN-SD),它是标准培训过程的下水替换。该方法建立在提议的邻域差异率(NDR)上,该差异以有效的方式量化了嵌入式图的非平滑度。基于此指标,我们提出了自适应差异保留(ADR)正常化程序,以增强知识的可转移性,从而在GNN层之间保持较高的邻里差异。我们还总结了一个通用的GNN-SD框架,该框架可以被利用以诱导其他蒸馏策略。实验进一步证明了我们的方法的有效性和概括:1)最先进的GNN蒸馏性能,培训成本较小,2)各种流行的骨干的一致且相当大的性能提高。
Recently, the teacher-student knowledge distillation framework has demonstrated its potential in training Graph Neural Networks (GNNs). However, due to the difficulty of training over-parameterized GNN models, one may not easily obtain a satisfactory teacher model for distillation. Furthermore, the inefficient training process of teacher-student knowledge distillation also impedes its applications in GNN models. In this paper, we propose the first teacher-free knowledge distillation method for GNNs, termed GNN Self-Distillation (GNN-SD), that serves as a drop-in replacement of the standard training process. The method is built upon the proposed neighborhood discrepancy rate (NDR), which quantifies the non-smoothness of the embedded graph in an efficient way. Based on this metric, we propose the adaptive discrepancy retaining (ADR) regularizer to empower the transferability of knowledge that maintains high neighborhood discrepancy across GNN layers. We also summarize a generic GNN-SD framework that could be exploited to induce other distillation strategies. Experiments further prove the effectiveness and generalization of our approach, as it brings: 1) state-of-the-art GNN distillation performance with less training cost, 2) consistent and considerable performance enhancement for various popular backbones.