论文标题
PKD:通过皮尔森相关系数的对象探测器的一般蒸馏框架
PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient
论文作者
论文摘要
知识蒸馏(KD)是一种广泛使用的技术,可以在对象检测中训练紧凑的模型。但是,仍然缺乏研究如何在异质探测器之间提炼的研究。在本文中,我们从经验上发现,尽管他们的检测头和标签分配不同,但异质教师探测器的更好的FPN功能可以帮助学生。但是,将特征图直接对齐以提炼探测器有两个问题。首先,老师和学生之间的功能幅度差异可能会对学生实施过度严格的限制。其次,来自教师模型的FPN阶段和具有较大特征幅度的通道可能主导蒸馏损失的梯度,这将压倒KD中其他特征的影响并引入大量噪音。为了解决上述问题,我们建议模仿Pearson相关系数的功能,以专注于教师的关系信息,并放宽对功能大小的约束。我们的方法始终优于现有的检测方法,并适用于同质和异类的学生教师对。此外,它的收敛速度更快。基于Resnet-50的视网膜和FCO的强大MaskRCNN-SWIN探测器作为教师,在COCO2017上获得了41.5%和43.9%的地图,分别比基线高4.1 \%和4.8%。
Knowledge distillation(KD) is a widely-used technique to train compact models in object detection. However, there is still a lack of study on how to distill between heterogeneous detectors. In this paper, we empirically find that better FPN features from a heterogeneous teacher detector can help the student although their detection heads and label assignments are different. However, directly aligning the feature maps to distill detectors suffers from two problems. First, the difference in feature magnitude between the teacher and the student could enforce overly strict constraints on the student. Second, the FPN stages and channels with large feature magnitude from the teacher model could dominate the gradient of distillation loss, which will overwhelm the effects of other features in KD and introduce much noise. To address the above issues, we propose to imitate features with Pearson Correlation Coefficient to focus on the relational information from the teacher and relax constraints on the magnitude of the features. Our method consistently outperforms the existing detection KD methods and works for both homogeneous and heterogeneous student-teacher pairs. Furthermore, it converges faster. With a powerful MaskRCNN-Swin detector as the teacher, ResNet-50 based RetinaNet and FCOS achieve 41.5% and 43.9% mAP on COCO2017, which are 4.1\% and 4.8\% higher than the baseline, respectively.