论文标题
INSCON:实例一致性特征通过自我监督学习
InsCon:Instance Consistency Feature Representation via Self-Supervised Learning
论文作者
论文摘要
通过自学学习的特征表示,在图像级对比度学习方面取得了显着的成功,这在图像分类任务上带来了令人印象深刻的表现。尽管图像级特征表示主要集中于单个实例中的对比学习,但它忽略了借口和下游预测任务(例如对象检测和实例分段)之间的客观差异。为了完全释放下游预测任务上的特征表示功能,我们提出了一个名为INSCON的新的端到端自我监督框架,该框架致力于捕获多个实体信息,并提取用于对象识别和本地化的单元格特征。一方面,INSCON构建了一个针对性的学习范式,该范式将多种现实图像作为输入,将所学的功能对齐相应的实例视图之间,这使其更适合多实体识别任务。另一方面,INSCON引入了细胞启用的拉动和推动,该构想利用细胞一致性来增强精确边界定位的细粒特征表示。结果,InsCon了解了空间特征表示上的语义特征表示和单元格的一致性的多实体一致性。 Experiments demonstrate the method we proposed surpasses MoCo v2 by 1.1% AP^{bb} on COCO object detection and 1.0% AP^{mk} on COCO instance segmentation using Mask R-CNN R50-FPN network structure with 90k iterations, 2.1% APbb on PASCAL VOC objection detection using Faster R-CNN R50-C4 network structure with 24k iterations.
Feature representation via self-supervised learning has reached remarkable success in image-level contrastive learning, which brings impressive performances on image classification tasks. While image-level feature representation mainly focuses on contrastive learning in single instance, it ignores the objective differences between pretext and downstream prediction tasks such as object detection and instance segmentation. In order to fully unleash the power of feature representation on downstream prediction tasks, we propose a new end-to-end self-supervised framework called InsCon, which is devoted to capturing multi-instance information and extracting cell-instance features for object recognition and localization. On the one hand, InsCon builds a targeted learning paradigm that applies multi-instance images as input, aligning the learned feature between corresponding instance views, which makes it more appropriate for multi-instance recognition tasks. On the other hand, InsCon introduces the pull and push of cell-instance, which utilizes cell consistency to enhance fine-grained feature representation for precise boundary localization. As a result, InsCon learns multi-instance consistency on semantic feature representation and cell-instance consistency on spatial feature representation. Experiments demonstrate the method we proposed surpasses MoCo v2 by 1.1% AP^{bb} on COCO object detection and 1.0% AP^{mk} on COCO instance segmentation using Mask R-CNN R50-FPN network structure with 90k iterations, 2.1% APbb on PASCAL VOC objection detection using Faster R-CNN R50-C4 network structure with 24k iterations.