论文标题
半监督对象检测的时间自我振兴老师
Temporal Self-Ensembling Teacher for Semi-Supervised Object Detection
论文作者
论文摘要
本文着重于半监督对象检测(SSOD)。知识蒸馏(KD)已被广泛用于半监督图像分类。但是,适应SSOD的这些方法具有以下障碍。 (1)教师模型作为老师和学生的双重角色,使得教师对未标记的图像进行预测可能与学生的图像非常接近,这限制了学生的上限。 (2)SSOD中的班级不平衡问题阻碍了从教师到学生的有效知识转移。为了解决这些问题,我们为SSOD提出了一种新型的时间自我浓缩教师(TSE-T)。与以前的基于KD的方法不同,我们设计了一个暂时发展的教师模型。首先,我们的教师模型在随机扰动下对未标记的图像进行了时间预测。其次,我们的教师模型将其时间模型的权重与学生模型的权重结合了指数的移动平均值(EMA),这使教师逐渐向学生学习。这些自我缩放的策略增加了数据并建模多样性,从而改善了对未标记图像的教师预测。最后,我们使用焦点损失来制定一致性正规化项来处理数据不平衡问题,这比仅保留自信的预测的简单硬质势thisting方法比未标记的图像中使用有用的信息更有效。在广泛使用的VOC和可可基准测试中,我们的方法的地图分别在VOC2007测试集和可可COCO2014 MINVAL5K设置上达到了80.73%和40.52%,这表现优于强大的全面监督检测器的表现为2.37%和1.49%。此外,我们的方法在VOC2007测试集中设置了SSOD中新的最先进的方法,该测试集优于基线SSOD方法的效果率为1.44%。这项工作的源代码可在http://github.com/syangdong/tse-t上公开获得。
This paper focuses on Semi-Supervised Object Detection (SSOD). Knowledge Distillation (KD) has been widely used for semi-supervised image classification. However, adapting these methods for SSOD has the following obstacles. (1) The teacher model serves a dual role as a teacher and a student, such that the teacher predictions on unlabeled images may be very close to those of student, which limits the upper-bound of the student. (2) The class imbalance issue in SSOD hinders an efficient knowledge transfer from teacher to student. To address these problems, we propose a novel method Temporal Self-Ensembling Teacher (TSE-T) for SSOD. Differently from previous KD based methods, we devise a temporally evolved teacher model. First, our teacher model ensembles its temporal predictions for unlabeled images under stochastic perturbations. Second, our teacher model ensembles its temporal model weights with the student model weights by an exponential moving average (EMA) which allows the teacher gradually learn from the student. These self-ensembling strategies increase data and model diversity, thus improving teacher predictions on unlabeled images. Finally, we use focal loss to formulate consistency regularization term to handle the data imbalance problem, which is a more efficient manner to utilize the useful information from unlabeled images than a simple hard-thresholding method which solely preserves confident predictions. Evaluated on the widely used VOC and COCO benchmarks, the mAP of our method has achieved 80.73% and 40.52% on the VOC2007 test set and the COCO2014 minval5k set respectively, which outperforms a strong fully-supervised detector by 2.37% and 1.49%. Furthermore, our method sets the new state-of-the-art in SSOD on VOC2007 test set which outperforms the baseline SSOD method by 1.44%. The source code of this work is publicly available at http://github.com/syangdong/tse-t.